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DIFFERENTIALLY REGULATED HEPATOCELLULAR CARCINOMA 

GENES AND USES THEREOF 

CROSS-REFERENCE TO RELATED APPLICATIONS 
This application claims the benefit of U.S. provisional application number 
60/475,508, filed June 4, 2003, which is h^eby incorporated by reference. 

FIELD OF THE INVENTION 
The inv^tion is in the field of diagnostics and therapeutics fi>r cancer. More 
specifically, the invention is in the field of diagnostics and theiapeutics fi>r 
hqpatocellular carcinoma. 

BACKGROUND OF THE INVENTION 
Hepatocellular carcinoma (HCC) is flbie most common primary maligiiant 
tumor of flie Mvcr that accounts for more than 70% of liver cancers worldwide (Parkin 
et al., 1999). Many risk factors have bcCTi associated with the development of HCC, 
including hepatitis B (HB V) and hepatitis C (HCV) viral infection, cirrhosis, male 
gender, exposure to toxins, etc. D^th generally occurs due to liv« &ilure associated 
wifli cinhosis and/or r^id out^wth of multiple nodules. Approximately 0J25-1 
million new cases of HCC are diagnose eadi year, and the cancer is es|>ecially 
prevalent in South^t Asia, China, and sub-Saharan Afiica. While surgical resection 
is consid^ed to be the main curative treatment, only 10-15% of cases are suitable for 
surgery at the time of presentation. This is because either flie disease is d^ected at an 
advanced stage at pres^tation or the und^lying poor Iiv» functional reserve 
precluded surgical intervention. 

Diagnosis of HCC has included detection of the presence of a liver mass on 
radiological investigations and the d^ection of elevated serum alpha fetoprotein 
(AFP) levels (Yu and Keeffe, 2003). However, elevation of AFP is not exclusive to 
HCC and has been observed in botiign hepatic disease, sudi as Kvw drihosis, and 
other cancers sudi as germ cell cancer (Bosl and Head, 1994). Treating of HCC has 
included intorfo-oii th«:apy and antiviial drags, but the results have proved 

1 



wo 2004/108964 



PCT/SG2004/000166 



unpredictable and the eflFectiven^s may be limited (Lee, 1997; Yu and Keefife, 2003). 
Microarrays have been used to address changes in gene expression of HCC (Chen et 
al, 2002, Okabe et al, 2001; Honda et al, 2001; Shirota et al, 2001; Tackels-Home et 
al, 2001; Xu et al, 2001a; Xu et al, 2001b). However, these reports were restricted to 
the tissue samples selected for each study and exhibited wide variation in the results, 
thus limiting the potential significance and utility of the data. 

SUMMARY OF THE INVENTION 

The invention provides in part molecular maik^ for hepatocellular carcinoma 
(HCC) that may be used for HCC diagnosis^ to assess HCC progression or regression, 
or the efficacy and/or toxidty of HCC therapeutics, and/or to id^tify candidate 
compounds for HCC th^apy, with high predictive accuracy. 

In one aspect, the invention provides a composition including an addressable 
collection of two or more nucleic acid molecules, or polypeptides encoded by these 
nucleic acid molecules, that are differentially exfMressed in hepatocellular carcinoma, 
wh^e the nucleic acid molecules consist essratially of the nucleic acid molecules set 
forth in any one or more of Tables 1 through 4 or con^l^ients, fragments, variants, 
or analogs thereof The composition may include all of the nucleic acid molecules, 
or Hs/^ encoded polyp^tides, set forth in any one or more of Tables 1 through 4 or 
complements, fiagm^ts, variants, or analogs thereof or any subset of these nucleic 
add molecules or polypeptide. The nucleic acid molecules or polypeptides may be 
diifi^aittaliy expressed betwem hepatocellular carcinoma tissue and non-tumor 
tissue. The nuddc add molecules CMT the polypeptides may be attached to 
siq)port Hne compositions may be used in the preparation of a medicamCTt for 
diagnosis or iher^y of hepatocellular carcinoma. 

In other aspects, the invention provides a method of diagnosing h^atocellular 
carcinoma in a subject by obtaining a sample firom the subject and detecting the level 
of expres^on of two or more nucleic acid molecules or expression products thereof in 
the sample, wh»e the nucldc acid molecules consist essentially of the nucleic acid 
molecides set forth in any one or more of Tables I fhtoug^ 4 or comjdemeats, 
fiagmmts, variants, or an^ogs thereof 
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In other aspects, the invention provides a method of monitoring the progression of 
hepatocellular carcinoma in a subject by obtaining a sample from the subject and 
detecting the level of expression of two or more nucleic acid molecules or expression 
products thereof in the sample, where the nucleic acid molecules consist essentially of 
5 the nucleic acid molecules set forth in any one or more of Tables 1 through 4, or 

complements, fiagm^ts, variants, or analogs thereof The sample may be obtained at 
two or more time points. Hie method may furttier include comparing the level of 
expression of the nucleic acid molecules or expression products at two or more time 
points* 

10 In other aspects the invention provides a method of monitoring the efficacy of 

a hepatocellular carcinoma therapy in a subject by administering the therapy to the 
subject, obtaining a sample from the subject, and detecting the level of expression of 
two or more nucleic acid molecules or expression products thereof in the sample, 
where the nucleic acid molecules consiist essentially of the nucleic acid molecules set 

15 forth in any one or more of Tables 1 ttnou^ 4, or complements, fiagmoits, variants, 
or analogs thereof. Hie therapy may be administered at two or more administration 
time points. The sample may be obtmned at two or more sampling time points. The 
m^od may fiurdier include comparing the level of expression of the nucleic acid 
molecules or expression products at two or more administration time points, and/or at 

20 two or more sampling time points. 

In other aspects, the invmtion provides a method of screening a compound for 
treating hepatocellular carcinoma by contacting a sample with a test compound and 
detecting the level of expression of two or more nucl^c acid molecules or expression 
products thereof in the sample, where the nucleic acid molecules consist essentially of 

25 the nucMc acid molecules set forfli in any one or more of Tables 1 through 4, or 
complements, fragments^ variants or analogs th^eo£ 

In alternate embodiments of the various aspects, the sample may be liv^ or 
serum, or may be suspected of being cancerous, or may be non-cancerous. The 
methods may further include comparing the level of expression of tiie nucldc acid 

30 molecules or expression products thereof in a non-cancerous sample and in a sample 
suspected of bding cancerous. Differential expression of the nucleic add molecules or 
expression products thereof may be indicative of hepatocellular carcinoma, or of 
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progression of hepatocellular carcinoma, or of the efficacy of the hepatocellular 
carcinoma therapy. The subject may be suspected of having hepatocellular 
carcinoma. The subject may be a human. 

In altemate embodiments of the various aspects, the method may fiirther 
5 include comparing the level of expression of two or more nucleic acid molecules or 
expression products ffa^eof with a standard, or fiirther include preparing a gene 
expression profile. The method may be a hig^ throughput method. 

In other aspects, the invention provides a solid support including two or more 
nucleic acid molecules or polypeptides ^coded by these nucleic acid molecules that 

1 0 are differentially expressed in h^atocellular carcinoma, whore the nucleic a<dd 
molecules consist essentially of the nucleic acid molecules set forth in Tables 1 
through 4 or complements, fi-agments, variants, or analogs thereof. The nucleic acid 
molecules may consist essentially of all the nucleic acid molecules set forfli in any 
one or more of Tables 1 through 4, and/or be differentially expressed between 

1 5 hepatocellular carcinoma tissue and non-tumor tissue. The polypeptides may consist 
essentially of the polyp^tides encoded by all die nucleic acid molecules set forth in 
any one or more of Tables 1 through 4, and/or be dififerentially e^qiressed betweCTi 
hq>atocellular carcinoma tissue and non^tumor tissue. The nucleic acid molecules or 
the polypeptides may be covaleatly or non*covalently attached to the solid siJ^port 

20 (e.g., a mioroarray). 

In other aspects, the invmtion provides a database including information 
idmtifying the expressicm level in liver tissue (e.g., cancerous or non-cancCTous 
tissue) of two or more nucleic acid molecules or eicpression products thereof where 
the nucleic acid molecules consist essentially of the nucleic acid molecules set forth in 

25 any one or more of Tables 1 throng 4, or complements, fiagments, variants, or 
analogs thereof. 

A "composition'* as used herein includes a plurality of the nucleic acid 
molecules des^ibed herein, including complements, analogs, variants, and fragments 
thereof. A com^sition as used haein also includes a plurality of polypeptides 
30 oiooded by the nucleic acid molecules dcsoibed herein, and complCTi^ts, analogs, 
variants, and firagmoits tii^eof A coinposition as used her^n also includes a plurality 
of polypq>tides capable of specifically binding to the polypeptides or nucleic acid 

4 



wo 2004/108964 



PCT/SG2004/OOOI66 



molecules described herein (e.g., antibodies). The composition may include any 
combination of the nucleic acid molecules described herein, including complements, 
analogs, variants, and fragments thereof, or polypeptides encoded by these nucleic 
acid molecules. Accordingly, the composition may include 2, 3, 4, 5, 6, 7, 8, 9, 10, 
etc, up to all of the nucleic acid molecules or polyp^tides described herein, e.g., in 
any one or more of the Tables or Figures herein. In some embodiments, the 
composition may include subsets of the nucleic acid molecules or polypeptides 
desoibed herein, e.g., in any one or more of the Tables or Figures h^ein, for 
example, subsets gfoiq>s by protein function or characteristics, e.g., proteins involved 
in the ubiquitindtion pathway, or proteins localized to a particular cellular 
compartmenL These nucleic acid molecules or polyp^tides may for exanc^le be used 
with a substrate (e.g, a soUd substrate or a liquid substrate) in a variety of 
applications, including Hxc diagnosis of HCC, or monitoring the progression of HGC. 

By "addressable collection*' is meant a combination of nucleic acid molecules 
or polypeptides capable of being detected by, for example, the use of hybridization 
techniques or antibody binding techniques or by any other means of detection known 
to those of ordinary skill in the art 

The terms ^"nucleic acid" or '^nucleic acid molecule" encompass both RNA 
(plus and minus strands) and DNA, including cDNA, genomic DNA, and synthetic 
(e.g., chemically synfliesized) DNA. The nucleic acid may be double-stranded or 
single-stranded. WhCTe single-stranded, the nucleic add may be the s^ise strand or 
die antisense strand. A nucleic add molecule may be any chain of two or more 
coval^tly bonded nucleotides, including naturally occurring or non-*naturally 
occurring nucleotides, or iaudeotide analogs or derivatives. By '^USTA" is meant a 
sequence of two or more covalently bonded, naturally occurring or modified 
ribonucleotides. One example of a modified RNA included within this term is 
phosphorotbioate RNA. By 'T>NA" is meant a sequence of two or more covalently 
bonded, naturally occurring or modified deoxyribonucleotides. By "cDNA" is meant 
compl^entary or copy DNA produced firom an RNA template by the action of RNA- 
dependwt DNA polymerase (reva:se transmptase). Thus a **cDNA clone" means a 
diqplex DNA sequdice complemmtary to an RNA molecule of int^est, carried in a 
cloning vector. An ^oligonucleotide" as used herein is a single stranded molecule 
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which may be used in h34>ridization or amplification technologies. In general, an 
oligonucleotide may be any integer from about 15 to about 100 nucleotides in length, 
but may also be of greater length. A "probe" or "primer*' is a single-stranded DNA 
or RNA molecule of defined sequence that can base pair to a second DNA or RNA 
5 molecule that contains a coraplOTientary sequence (the target). The stability of the 
resulting hybrid molecule depends upon the extent of the base pairing that occurs, and 
is affected by parameters such as the degree of complOTientarity between the probe 
and target molecule, and the degree of stringency of the hybridization conditions. The 
d^ee of hybridization stringmcy is affected by parameters su<& as the temperature, 
10 salt concentration, and conc^tration of organic molecules, such as formamide, and is 
determined by mettiods that are known to those skilled in the art Probes or primers 
specific for the nucleic acid sequence described h^eii^ or portions thereof may vary 
in length by any integ^ firom at least 8 nucleotides to over 500 nucleotides, including 
any value in between, depending on the purpose for whidi, and conditions under 
15 which, the probe or primer is used For example, a probeor primer maybeS, 10, 15, 
20, or 25 nucleotides in length, or may be at least 30, 40, 50, or 60 nucleotides in 
length, or may be over 100, 200, 500, or 1 000 nucleotides in length. Probes or 
primers specific for the nucleic add molecules described herein may have greater than 
any integer betwe^ 20-30% sequence identity, or at least any integer between 55- 
20 75% sequ^ce identity, or at least any integer betweoa 75-85% sequence identity, or 
at least any integer between 85-99% sequ^ce identity, or 100% sequence identity to 
the nucleic acid sequmces described hordn. Probes or primes can be detectably- 
labeled, eitha: ladioactively or non^radioactively, by methods that are known to those 
skilled in the art Probes or priinm can be iised for me&ods involving nucleic add 
25 hybridization, sudt as nucleic add sequencing, nucldc add amplification by the 
polymerase diain reaction, single stranded conformational polymorphism (SSCP) 
analysis, restriction fragment polymorphism (RFLP) analysis. Southern hybridization, 
northoGn hybridization, in situ hybridization, electrophoretic mobility shift assay 
(EMS A), nMoroarcay, and otho: methods that are known to those skilled in the art 
30 Probes or prim^ may be derived from genomic DNA or cDNA, for example, by 
amplification, or from cloned DNA segments, or may be chemically synthesized. 
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The "exjpression product" of a nucleic acid molecule may be any polj^eptide 
encoded by that nucleic acid molecule. Generally, the polyp^tide is capable of being 
expressed. 

A '^protein/' "peptide'* or "polypeptide" is any chain of two or more amino 
5 acids, including naturally occurring or non-naturally occurring amino acids or amino 
acid analogues, regardless of post-translational modification (e.g., glycosylation or 
phosphorylation). An "ainino add sequence'*, **polyp^tide", "pqjtide' - or ''protein" of 
the invoHLtion may include peptides or proteins that have abnormal linkages, cross 
Unks and ^d caps, non-peptidyl bonds or alternative modifying groups. Such 

10 modified peptides are also within the scope of the invCT^tion. The teaa ''modifying 
group" is intended to include structures that are directly attached to the peptidic 
structure (e.g., by covalent coupling), as well as those that are indirectly attached to 
the peptidic structure (e.g., by a stable non-covalent association or by covalent 
coupling to additional amino acid residues, or mimetics, analogues or derivatives 

15 thCTeof, which may flank the core peptidic structure). For example, the modifying 
groiqp can be coupled to the amino-t^minus or carboxy-t^cminus of a peptidic 
stractute, or to a peptidic or peptidomimetic region flanking the core domain. 
Altonatively^ the modifying group can be coupled to a side chain of at least one 
amino add residue of a p^tidic structure, or to a peptidic or peptide- mimetic region 

20 flanking the core domain (e.g., throng the epsilon amino group of a lysyl residue(s), 
through the carboxyl group of an aspartic acid resLdue(s) or a glutamic add residue(s), 
through a hydroxy group of a tyrosyl residue(s), a s^ne residue(s) or a threonine 
residue(s) or other suitable reactive group on an amino acid side chain). Modifying 
^bups coval^tly covipled to the peptidic structure can be attached by means and 

25 using methods well known in the art for linking diemical stmctures, including^ for 
example, amide, alkylamino, carbamiate or urea bonds. Peptides according to the 
invention may include p^tides encoded by the nucleic add molecules of Tables 1 
throu^ 4 or complCTioits or analogs thereof^ 

By "differ^tial egression" or * - ^f£^atitially expressed" is meant increased, 

30 upregulaited or preset, or decreased, downregulated or abs^t, gene e?qpression as 
detected by the absence, presence, or change (up or down) in the amount of 
transcribed messoig^ RNA or translated protdn in a sample. For example, the 
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change may be detected by comparison of the change in gene expression level 
between a HCC sample and a non-tumor sample. The absolute amount of change of 
gene expression is not important, as long as the amount of change is reproducible, and 
measurable. In some ^nbodiments, the change (up or down) in the amoimt of 
transcribed messenger or translated protein may be at least 1-fold or at least 1 .5-fold 
or may be over 2.0, 2.5., 3.0, 3.5, 4.0, 4.5, or 5.0-fold. In some CTibodim^ts, the 
dbange in the amount of transcribed messmger or translated protdn may be 40%, 
50%, 60%, 70%, 80%, 90%, or 100%. 

By ^'detecting" it is intended to include determining the pr^ence or abs^ce, 
or quantifying the amount, of a nucleic a<nd molecule or polyp^tide of flie invention 
a substance. The term Ihus refers to the use of the materials^ compositions, and 
methods of the present invention for qualitative and quantitative detmninations. For 
example, detecting an increase in gene expression levels may include quantifying a 
change of any value between 10% and 90%, or of any value betwe^ 30% and 60%, 
or over 100%, of any of the nucleic acid molecules or polypeptides of the invention 
when compared to a control. In other embodim^ts, detecting an increase in jgene 
e3q>ression levels may include quantifying a change of any value b^ween 1 to 5 fold 
or more of mxy of the nucleic acid molecules or polypq>tides of Ihe invention whea 
compared to a control. 

'^Hepatocellular carcinoma" is cancer diat arises firom hepatocytes, the major 
cell type of the livor. It is a form of adoiocarcmoma, and is flie most coitmnon type 
of liv«r tumor. **Non-tumof" tissue refers to tissue or cells that are non-canceiX>us. hi 
some mibodiments, non-tumor tissue may include tissue or cells firom a subject 
having a liver disorder, such as HBV or HCV infection, cirrhosis, exposure to 
aflatoxins, etc. The phrase "suspected of being cancCTOus" as used herein means a 
HCC tissue sample beUeved by one of ordinary skill in the art to contain HCC cells. 
By **non-K:ancerous'' or ^'non-tumor*' is meant a tissue sample demonstrated by 
standard diagnostic or other techniques (e.g., histologic staining^ mioroscopic 
analysis, immunoassay, etc.) to contain no HCC cells or evidence of HCC. 

A "sanoqple'' can be any organ, tissue, cell, or cell extract isolated from a 
subject, such as a s^tnple isolated from a mammal having a h^atocellular carcinoma 
or isolated from a mammal not haying a hq>atoceUuliea: carcinoma or a tiu^ For 
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example, a sample can include, without limitation, tissue such as liver tissue (e.g., 
from a biopsy or autopsy), cells, peripheral blood, whole blood, red cell conc^trates, 
platelet concentrates, leukocyte concentrates, blood cell proteins, blood plasma, 
platelet-rich plasma, a plasma conc^trate, a precipitate from any fractionation of the 
5 plasma, a supernatant from any fractionation of the plasma, blood plasma protein 
fractions, purified or partially purified blood proteins or other components, serum, 
semen, mammalian colostrum, milk, urine, stool, saliva, placental extracts, amniotic 
fluid, a cryoprecipitate, a c^osupematant, a cell lysate, mammalian cell culture or 
culture medium, products of fmnentation, ascitic fluid, proteins present in blood 

10 cells, solid tumours isolated fix>m a mammal with a hq>atocellular carcinoma, or any 
ofh^ specim^ or any extract th^eof, obtained from a patient (human or animal), test 
subject, or experim^tal animal. A sample may also include, widiout limitation, 
products produced in cell culture by normal, non-tumor, or transformed cells (e.g., via 
recombinant DNA technology). A "saniple" may also be a cell or cell line created 

15 under experimmtal conditions, that are not directly isolated from a subject A sample 
can also.be cell-firee, artificially derived or synfliesised. In some enibodiments, 
samples refar to liver tissue or cells. In some embodiments, the liver tissue may be 
from a subjo^ having a hepatocellidar cardbaoma; a subject infected with a hepatitis 
virus; a subject having a liver disorder e^g., cirrhosis, or a subject having a normal 

20 liv^ e.g., not disposed with or suspected of having a liver disorder. 

As used hadn, a subject may be a human, non-human primate, rat, mouse, 
cow, horse, pig, sheep, goat, dog, cat, etc. The subject may be a clinical patient, a 
clinical trial vohmteer, an expmm^tal animal, etc. The subject may be suspected of 
having HCC, be diagnosed with HCC, or be a control subject that is confirmed to not 

25 have HCC. Diagnostic mefliods for HCC and the clinical delineation of HCC 

diagnoses are known to those of ordinary skill in the art, and include biopsy including 
radiological biopsy by means of a radiological scan, laproscopy, or open surgical 
biopsy. 

30 BRffiF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a plot showing natural patterns of gene expression differences 
betwcCTL HCC tumor and non-tumor hver tissue specimens based on unsupervised 
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clustering. The plot shows the variance of expression value for each of the gene 

features across all the HCC tumor and non-tumor liver tissue specimens. The dotted 

line indicates the 500 most variable gcme features. 

Figure 2 is a multidimraisional scaling plot showing significant gene 
5 differ^tial expression between HCC tumor and non-tumor liver tissues, and 

comparison with Uver cancer ceU lines (P<lxlO'^, approximately 1.5-fold change). 

The plot illustiates the ability of flie 218 outli» genes to separate HCC tumor 

specimens (blade dndes) fiom non-tumor liver tissue spedm^ (dark gray circles). 

The plot also diows how different Uver cancer cell lines (light gray circles) are fiom 
10 die clinical tissue sanq>les. 

F^ures 3A-B chaiacteize diffaroitially expressed g/esaes in HCC tumor 

specimens (P<lxlO'^, approximately 1.5-fold diange). Figure 3A is a bar graph 

showing the chromosomal distribution of the 218 outlier genes. The dark colored and 

light shaded bars r^resent genes that are at least 1 .5-fold up- and downregulated, 
15 req)ectively, in HCC tumors relative to non-tumor liveiSw Figure 3B is abar graph 

showing the functional charact^ization of the outlio: genes based on Gene Ontology 

and published woiks. 

Figore 4 is abar graph sdiowing the expression of BMI-1 in HCC tumors as 

det^mined by cDNA miooarray analysis. The data are pres^ted as the level of 
20 exiM»ssionOog base 2) in each HCC tumor spedanen with respect to the 

coires^nding non-tumor liv^ sample. 

Ftgures 5A-D show real-time RT-PCR analysis of IGFBP3, ERBB3, ERBB2 
and EGFR in HCC tumor sanqiles. The goie expression patterns for (A) IGFBP3, 
<B) ERBB3, (C) ERBB2 and OD) EGFR in sH the 37 HCC tumor sao^les and their 

25 corresponding non-toimor liver tissue spedunens were examined. All data was 

normalized to flie amount of 'housdce^ing* gene PBGD and are presented as relative 
fold expression change (log base 2 ratio) in HCC tumor specimens with respect to 
thdr corresponding non-tumor liva: counterpart Positive value depicts 
fairer CTqiression level, while negative value depicts lower ejq)ression level in 

30 the tumor relative to the non-tumor spedmen. 
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Figure 6 lists a panel of genes analyzed using real-fime and semi-quantitative 
RT-PCR analyses, and indicating whether the analysis was conducted in non-tumor 
human tissues, and in clinical tissue samples or HCC cell lines or both. 

Figures 7A-C show geao expression analysis of ARMET. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and pati^t samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- • 
tumor human tissues (A), The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Frtal/F, F^al/M, Adults, Adult/M. 

Figures 8A-C show gene expression analysis of BMI-1 . Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues Le,, Fetal/F, Frtal/M, Adult/F, Adult/M. 

F^;ures 9A-C show gene egression analysis of CRHBP. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (C) was performed GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor Uvct tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M, 

Figures lOA-C show gene expression analysis of CSTB. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patient samples (Q was performed GAPDH 
expression level was used as the control m the analyses of HCC cell lines vs. non^ 
tumor human tissues (A). Hie dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M^ Adtilt/F, Adult/M. 

Figures llA-C show gene expression analysis of DPT. Semi-quantitative 
RT-PCR analysis (A) of non-tumor tissues and HCC cell lines and real time RT-PCR 
analysis of non-tumor tissues (B) and patimt samples (C) was p^omied GAPDH 
expression level was used as the control in flie analyses of HCC cell lines vs. non- 
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tumor human tissues (A). The dotted line in (B) indicates mean expression value of 
four non-tumor liver tissues i.e., Fetal/F, Fetal/M, Adult/F, Adult/M 

Figures 12A-B show gene expression analysis of ERBB3. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor livGt tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 13A-B show gene e^^ression analysis of EZH2. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 14A-B show gene expression analysis of GPC3. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liver tissues i.e., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figures 15A-B show gene expression analysis of HDGF. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean expression value of four non-tumor liv^ tissues i.e., 
Fetal/F, Fetal/M, Adidl/F, Adult/M. 

Figures 16A--B show gene expression anal>^is of MDK. Real time RT-PCR 
analysis of non-tumor tissues (A) and patient samples (B) was performed. The dotted 
line in (A) indicates mean e^qiression vahte of four non-tumor liver tissues Le., 
Fetal/F, Fetal/M, Adult/F, Adult/M. 

Figure 17 shows gene expression analysis of D123. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was perfomied. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 18 shows geae expression analysis of FLJ10326, Semiquantitative 
RT-PCR analysis of non-tumor tissues and HCC cell lines w?is performed. GAPDH 
expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 19 ^ows gCTie expression anal>^is of ICA-1 A. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
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expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 20 shows gene expression analysis of LASPl. Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was p^rfomied. GAPDH 
5 expression level was used as the control in the analyses of HCC cell lines vs. non- 
tumor human tissues. 

Figure 21 shows geno expression analysis of PODXL, Semi-quantitative RT- 
PCR analysis of non-tumor tissues and HCC cell lines was performed. GAPDH 
expression level was used as tiie control in the analyses of HCC cell lines vs. non- 
1 0 tumor humian tissues. 

DETAILED DESCRIPTION OF THE INVENTION 
Phenotypic changes in cancer may be due to cellular changes at the nucleotide 
level. Thus, some g^es may be expr^sed, overeiqiressed, or under-expressed in 
15 tumor cells relative to non-tumor cells; However, a wide variation exists in gene 
expression pattens among cancer patients, including HCC patients. Therefore, 
examining the regulation or expression of a single gene or target, or even of multiple 
g&ies or targets whose regulation or expression vary across difTer^t HCC tumors, 
may be insufiSdoit for accurate diagnosis or treatm^t of HCC or for screening of 
HCC th»^£4>eutics. Selecting a set of diff«:entially exjuiessed HCC genes, nucleic add 
molecules, and/or pol3/peptides, assists in predictable and accurate diagnosis and 
flierapy, and design of efiKcacious therapeutics. 

The inv^tion provides, in part, nucleic acid molecules and polyp^tides fliat 
are dififerentially expressed in HCC cells, when compared to non-HCC tissue, e.g., 
livCT or serunu Thus, the invoation provides, in part, molecular matkers for HCC 
derived from the analysis of global changes in goie egression ("grae expression 
profiles*0 betwe^ HCC tissue and non-HCC tissue. More specifically, cDNA 
mio-oanays wore used to examine the global cellular changes in matched pairs of 
HCC tumor arid nourtumor tissues of patients diagnosed with HCC. In addition, g&xe 
expression patterns b^wem priniary HCC tumors and Uvct canco: cell lines w^e 
examined for possible biological variation. 
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The nucleic acid molecules or polypeptides provided by flie invention, as well 
as subsets thereof; serve as molecular markers that may be used for example for HCC 
diagnosis; to assess HCC progression or regression; to access the efficacy and/or 
toxicity of HCC therapeutics; and/or to identify candidate compounds for HCC 
therapy, with high predictive accuracy. The genes lists identified pennit rapid, simple, 
and reproducible screening of a variety of HCC samples by, for example, nucleic add 
naiax>array h>i)ridization or protein expression technology to detomine the 
expression of the specific graes, or by other means such as differential display, gel 
electrophoresis, genome mismatdi scanning, repres^tational disoiminate analysis, 
clust^in^ transcript imping, etc, used singly or in combination. Thus, flie selected 
nucleic acid molecules or polypq>tides of the invration define standard and 
reproducible difTer^tial ei^ression patterns against which to compare the expression 
pattern in a variety of tissue or cells, e.g., HCC tissue or cells or serum, obtained by 
biopsy, autopsy, or firom cell lines and/or in vitro treatment or assays. The selected 
nucleic adLd molecules or polyp^tides of flie invention and subsets thereof provide 
reliable detection of HCC cells or tissue, witfi reduction or elimination of false 
positives or false negatives. In some CTibodimCTits, the inv^tion provides composite 
S€is of disociminator genes for use as gen^ or global HCC tumor maikets. In some 
embodiments, the nucleic acid molecules or polypeptides of the invention may be 
used to assess the suitability of a HCC cell line for use as a modd for HCC, as gene 
expression profiles may vary between primary HCC tumors and HCC cell lines. 

Various alternative CTobodiments and examples of the invention are described 
h^ein;; These embodiniCTts and examples are illustrative and should not be construed 
as limiting the scope of the inv^tion. 

Nucleic Ac id Molecules. Polvpeptides. And Test Compounds 

Compounds according to the invention include, wittiout limitation, molecules 
substantially identical to flie nucleic add molecules of Tables 1 throu^ 4 (e.g., BMI- 
1, ARMET, CRHBP, CSTB, DPT, ERBB3, EZH2, GPC3, HDGF, MDK, D123, 
iaJ10326, IGAP-1 A, LASPl, PODXL) and complements, anrf^ 
variants tfaereoi^ induding, for example, flie polypeptides described herein fliat are 
encoded by the nucleic add molecules of Tables 1 tfarougji 4, as well as homologs and 
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fragments thereof. In some embodiments of the invention, compounds of the 
invention include antibodies that specifically bind to polypeptides encoded by the 
nucleic add molecules of Tables 1 througji 4. An antibody "specifically binds** an 
antigen when it recognises and binds the antigen, for example, a polypeptide encoded 
by any of the nucleic acid molecules described herein, but does not substantially 
recognise and bind ottter reference molecules in a sample, for example, a polypeptide 
that is ^coded by a nucleic acid molecule that is not substantially idmtical to any of 
the nucleic acid molecules described herein. Such an antibody has, for example, an 
afiSnityfortheantigrawhidiisatleast 10, 100, 1000 or 10000 times greater than the 
affinity of the antibody for another refer^ce molecule in a sample. 

A ^'substantially idmtical" sequence is an amino acid or nucleotide sequence 
that differs fi-om a reference sequence only by one or more conservative substitutions, 
as discussed herein, or by one or more nonHX>nservative substitutions, deletions, or 
insertions located at positions of the sequence that do not destroy the biological 
fimction of tiie amino acid or nucleic acid molecule, or that do not destroy the 
detectability (e.g., by hybridization or specific binding) of the amino acid or nucleic 
add molecule. Such a sequmce can be any integ» fix>m 10% to 99%, or more 
generaUy at least 10%, 20%, 30%, 40%, 50, 55% or 60%, or at least 65%, 75%, 80%, 
85%, 90%, or 95%, or as much as 96%, 97%, 98%, or 99% identical when optiinally 
aligned at the amino acid or nucleotide level to the sequ^ce used for comparison 
using, for example, the Align Program (Myers and MmcT, CABIOS, 1989, 4:1 1-17) 
or FASTA. For i>olypeptides, flie length of comparison sequences miay be at least 2, 
5, 10, or 15 amino acids, or at least 20, 25, or 30 amino acids. In alternate 
anbodiments, the length of comparison sequ^ces may be at least 35, 40> or 50 amino 
adds, or over 60, 80, or 100 amino acids. For nucleic acid molecules, the length of 
comparison sequOTces may be at least 5, 10, 15, 20, or 25 nucleotides, or at least 30, 
40, or 50 nucleotides. In alternate embodiments, the length of comparison sequences 
may be at least 60, 70, 80, or 90 nucleotides, or over 100, 200, or 500 nucleotides. 
SequCTLce id^tity <^ be readily measured using publicly available sequence analysis 
software (e.g., Sequ^ce Analysis Software Package of flie Genetics Compute Group, 
Universityr of Wisconsin Biotechnology C^ta, 1710 University Ay^ue, Madison, 
Wis, 53705, or BLAST software available fix)m the National Library of Medicine, or 
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as described h^ein). Examples of useful software include the programs Pile-up and 
PrettyBox. Such software matches similar sequences by assigning degrees of 
homology to various substitutions, deletions, substitutions, and other modifications. 
Alternatively, or additionally, two nucleic acid sequences may be 
5 "substantially identical*' if they hybridize under high stringency conditions. In some 
embodimmts, high stringency conditions are, for example, conditions that allow 
hybridization comparable with the hybridization that occurs using a DNA probe of at 
least 500 nucleotides in length, in a buffer containing 0.5 M NaHPOij, pH 7.2, 7% 
SDS, 1 nlM EDTA, and 1% BSA (firaction V), at a temp^:ature of 65'X:;, or a buffo: 

10 containing 48% formamide, 4.8x SSC, 0.2 M Tris-Cl, pH 7.6, Ix Denhardf s solution, 
10% dextran sulfate, and 0. 1 % SDS, at a temperature of 42''C. (These are typical 
conditions for higji stringency northon or Southern hybridizations.) Hybridizations 
may be carried out over a period of about 20 to 30 minutes, or about 2 to 6 hours, or 
about 10 to 15 hours, or over 24 hours or more. High stringency hybridization is also 

15 relied upon for the success of numerous techniques routinely performed by molecular 
biologists, such as higji stringency PCR, DNA sequencing, single strand 
conformational polymoiphism analysis, and in situ hybridization. In contrast to 
northern and Southern hybridizations, these techniques are usually performed with 
relatively short probes (e.g., usually about 1 6 nucleotides or long^ for PCR or 

20 sequCTicing and about 40 nucleotides or longer for in situ hybridization). The high 
stringency conditions used in these tedmiques are well known to those skilled in the 
> ait of molecular biology, and examples of th^n can be found, for example, in Ausubel 
et aL, Current Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y^ 
1998, which is h^eby incorporated by refa:^ce. 

25 A 'Variant" is a nucleic acid molecule that is a recognized variation of a 

nucleic acid molecule or expression product thereof. Splice variants may be 
determined for example by using computer programs, e.g, BLAST. Allelic variants 
have in general a high percent identity to tiie nucleic acid molecule of interest ^'Single 
nucleotide polymorphism*' (SNP) refa:s to a change in a single base as a result of a 

30 substitution, ins^on or deletion. The change may be consenrative (purine for purine) 
or non-conservative purine to pyrimidine) and may or may not result in a change in 
an encoded amino acid. An ^^analog*' is a nucleic add molecule or polypeptide that 
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has been subjected to a chemical modification. Nucleic acid analogs can include 
substitution of a non-traditional base such as queosine or of an analog such as 
hypoxanthine, or other substitutions known in the art Analogs in general retain the 
biological activities of the naturally occurring molecules but may confer advantages 
such as longer lifespan or enhanced activity. By "complementary" or **complemenf ' is 
meant that two nucleic acids, e.g., DNA or RNA, contain a sufficirat number of 
nucleotides whidi are enable of forming Watson-Ciidc base pairs to produce a 
region of double-strandedness between the two nucleic acids. TTius, adenine in one 
strand of DNA or RNA pairs with thymine in an opposing complementary DNA 
strand or with uracil in an opposing oompl^mtaiy RNA strand. It will be understood 
fliat each nucleotide in a nucleic acid molecule need not form a matched Watson- 
Cricdc base pair with a nucleotide in an opposing complementary strand to form a 
duplex. A nucleic acid molecule is "complementary to another nucleic acid 
molecule, or is a "complemenf * of that other nucleic acid molecule, if it hybridizes, 
xmder conditions of high stringency, with the second nucleic acid molecule. The 
**complemenf ' of a nucleic acid molecule of Tables 1 through 4 may in some 
embodiments include a nucldc a<ad molecule that is complementary o vdr the full 
iCTigth of the sequence of a nucleic acid molecule of Tables 1 throu^ 4. A 
•^gm«f * may be any portion of a nucl^c acid molecule or polypeptide as desmbed 
herein that is capable of being diff&eatieHy expr^ed or detected in an assay or 
screening method according to the inv^tion. 

Various genes and nucleic axAd sequCTces of flie invention may be 
recombinant sequences* The t«m •Recombinant'* means that something has been 
recombined, so that when made in reference to a nucl^c acid construct the term refi^ 
to a molecule that is comprised of nucl^c add sequ^ces that are joined together or 
produced by means of molecular biological techniques. The teem *Vecombinanf ' when 
made in reference to a protein or a polypeptide refers to a protein or polypeptide 
molecule whidi is expressed using a recombinant nucleic acid constract created by 
means of molecular biological tedmigues. The torn •'recombinanf ' whea made in 
refer^ce to gCTietic composition ref^s to a gamete or progCTiy with new 
combinations of alleles that did not occur in the parental genomes. Recombinant 
nucleic add constructs may include a nucleotide sequ^ce whidi is ligated to, or is 
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manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in 
nature, or to which it is ligated at a different location in nature. Referring to a nucleic 
acid construct as 'recombinant' therefore indicates that the nucleic add molecule has 
been manipulated using genetic engineering, i.e. by human intervention. 
5 Recombinant nucleic acid constructs may for example be introduced into a host cell 
by transformation. Sudi recombinant nucleic acid constmcts may include sequences 
derived from the same host cell species or from diff<^ent host cell species, whi<^ have 
been isolated and reintroduced into cells of the host species. Recombinant nucleic 
acid construct sequences may become integrated into a host cell g^ome, either as a 

1 0 xesult of the original transformation of the host cells, or as the result of subsequCTit 
recombination and/or r^air ev^ts. 

As used herein, 'heterologous*' in reference to a nucleic acid or protein is a 
molecule that has been manipulated by human intervention so that it is located in a 
place other than the place in which it is naturally found. For example, a nucleic acid 

15 sequence from one species may be introduced into the genome of another species, or a 
nucMc add sequence from one genomic locus may be moved to another genomic or 
ex:traGfaromasdmal locus in the same species. A heterologous protein includes, for 
example, a protein expressed from a heterologous coding sequence or a protein 
expressed from a recombinant gene in a cell that would not naturally express the 

20 protein. 

A conqK>und is "substantially pure" when it is s^arated from the components 
that naturally accompany it Typically, a compound is substantially pure when it is at 
least 10%, 20%, 30%, 40%, 50%, or 60%, more generally 70%, 75%, 80%, or 85%, 
or over 90%, 95%, or 99% by weight, of the total material in a sample. Thus, fi>r 

25 example, a polypeptide that is chemically synthesised, produced by recombinant 
tedmplogy, isolated by known purification techniques, will be goierally be 
substantially free from its naturally associated componmts. A substantially pure 
compound can be obtained, for example, by extraction from a natural source; by 
e^qpression ofa recombinant nucleic acid molecule ^coding a polypeptide compound; 

30 or by chemical synthesis. Purity can be measured using any appropriate medbiod such 
as cplunm diromatography, gel electrophoresis, HPLC, etc. A nucleic acid molecule 
is substantially pure or ^isolated'* when it is not immediately contiguous wifli (i.e., 
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covalently linked to) the coding sequences with which it is normally contiguous in the 
naturally occurring genome of the organism from which the DNA of the invention is 
derived. Therefore, an "isolated*' gene or nucleic acid molecule is intended to mean a 
gene or nucleic acid molecule which is not flanked by nucleic acid molecules which 
normally (in nature) flank the gene or nucleic add molecule (such as in genomic 
sequences) and/or has been completely or partially purified fix>m other transcribed 
sequCTces (as in a cDNA or KNA library). For example, an isolated nucleic acid of 
the invCTtion maybe substantially isolated with respect to the complex cellular milieu 
in which it naturally occurs. In some instances, the isolated material will form part of 
a <x>mposition (for example, a crude extract containing other substances), buffer 
system or reag&A mix. In other circumstance, the matmal may be purified to essential 
homogeneity, for example as determined by PAGE or column chromatography such 
as HPLC. Hie term therefore includes, e.g., a recombinant nucleic acid incorporated 
into a vector, such as an autonomously repUcating plasmid or virus; or into the 
genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule 
(e.g., a cDNA or a gnomic DNA firagment produced by PGR or restriction 
endonuclease treatm«t) independent of other sequences. It also inchxdes a 
recombinant nucleic add which is part of a hybrid gene encoding additional 
pQlypq)tide sequences. Preferably, an isolated nucleic acid comprises at least about 
40%, 50%, 60%, 70%, 80%, 90%, 95%, or 99% (on a inolar basis) of aU. 
macromolecular species present Thus, an isolated gene or nucl^c acid molecule can 
include a gene or nucleic acid molecule which is synthesized dimucally or by 
recombinant means. Recombinant DNA contained in a vector are included in the 
definition of ^isolated^' as used herein. Also, isolated nucleic acid molecules include 
recombinant DNA inolecules in heterologous host cells, as well as partially or 
substantially pxirified DNA molecules in solution. In vivo and in vitro RNA 
transcripts of the DNA molecules of the present invention are also encompassed by 
'^isolated" nucleic add molecules. Such isolated nucldc acid molecules are nsefiil in 
the matm&cture of the^coded i>olypepitide, as probes for isolating homologous 
sequences (e.g., firom other mammalian spedes), for gene mapping (e.g*, by in situ 
hybridization with chromosomes), or for detecting expression of the gene in tissue 
(e.g., human tissue, sudi as p^pheral blood), such as by North^n blot analysis. 
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Polypeptide compounds can be prepared by, for example, replacing, deleting, 
or inserting an amino acid residue at any position of a peptide or a peptide analog, for 
example, a peptide as described herein, with other cons^ative amino acid residues, 
i.e., residues having similar physical, biological, or chemical properties. It is well 
5 known in the art that some modifications and changes can be made in the structure of 
a polypeptide without substantially alt^ing the biological fimction of that peptide, to 
obtain a biologically equivalent polypeptide. In one aspect of the invention, 
polypq)tides of the present invOTtion also extend to biologically equivalmt peptides 
that diff« ftom a portion of the sequence of the polypeptides of the preset invention 

10 by conservative amino add substitutions. As used herein, flie t^m "conserved amino 
acid substitutions*' refens to the substitution of one amino add for another at a given 
location in the pq^tide, where the substitution can be made without substantial loss of 
the relevant ftinction. In making such changes, substitutions of like amino add 
residues can be made on the basis of relative similarity of side-chain substituents, for 

1 5 example, their size, diarge, hydrophobidty, hydrophilidty, and the like, and such 

substitutions may be assayed for their effect on the function of the peptide by routine 
testing. ConsCTvative changes can also include the substitution of a chemically 
deri vatised moiety for a iM>n-derivatised residue, by for example, reaction of a 
fbnctional side group of an amino add. Pq)tides or peptide analogs can be 

20 synfhesised by standard diemical tedmiques, for example, by automated synthesis 

using solution or solid phase synthesis m^odology. Automated peptide synthesisers ^ ^ 
are commOTdaUy available and use tedmiquesweUlmown in the art Peptides and 
peptide analogs can also be prq>ared using recombinant DNA tedmology using 
standard m^ods sudi as those described in, for example, Sambrook; et al. 

25 O^olecular Cloning: A Laboratory Manual. 2**^ ed.. Cold Spring Harbor Laboratory, 
Cold Spring Haibor Laboratory Press, Cold Spring Haibor, N.Y., 1989) or Ausubel et 
al. (Current Protocols in Molecular Biology, John Wiley & Sons, 1994). Computer 
programs such as LASEROENE software (pNASTAR, Madison Wis,), 
MACVECTOR software (Genres Computer Grpupj Madison Wis.) and RasMol 

30 software (wWw.umas&eduAnicrobiiQ^rasmol) may be used to ddten^^ 

how many amino adid residues in a particular portion of the protein may be 
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substituted, inserted, or deleted without abolishing biological or immunological 
activity. 

Monitoring changes in gene expression may also be advantageous when 
screening candidate HCC th^apeutics. Often candidate compounds are soeened and 
5 prescreened for the abiUty to interact with a major target without regard to other 

efifects they may have on cells or in the subject to be treated, such as toxicity, whidi 
prev^t the developmmt and use of the pot^tial compound. Thus, the methods of the 
invention may be used to identify candidate compounds suitable for HCC therapy. 

hi general, candidate or test compoimds are identified fi-om large libraries of 

10 both natural products or synthetic (or semi-synth^c) extracts or chemical libraries 
according to methods known in the art Those skilled in the field of drag discovery 
and development will understand that the precise source of test extracts or compounds 
is not critical to the methods of die invention. Accordin^y, virtually any number of 
chemical extracts or compounds can be screened using die exCTiplary methods 

I S desoribed her^n. Examples of such extracts or compounds indude, but are not limited 
to, plant-, fimgal-, prokaryotic- or animal-based extracts, fermentation broths, and 
synth^c compounds, as well as modification of existing compounds. Numapus 
methods are also available for generating random or directed synthesis (e.g, semi- 
syntibesis or total synthesis) of any number of diCToical compounds, including^ but not 

20 limited to, saccharide-, lipid-, peptide-, and nucleic add-based compounds. Synthetic 
compound libraries are conMnercially available. Altamatively, libraries of natural 
compounds in the form of bacterial, fungal, plant, and animal extracts are 
commercially available firom a number of sources, including Biotics (Sussex, UK), 
Xenova (Slough, UK), Harbor Brandi Oceanographic Institute (Ft Pierce, FL, USA), 

25 and PharmaMar, MA, USA. Furth^more, if desired, any library or compound is 
readily modified using standard cheinical, physical, or biochemical methods. 
Candidate compounds usefiil for treating HCC may also be idmtified by assessing 
variations in ttie expression of one or more HCC mark^^ firom Tables 1 throu^ 4, 
prior to and after contacting HCC cells or tissues -mOx candidate pharmacological 

30 iEigents for the treatmoit of HCC. Hie cells may be grown in culture (e.g. fiom a HCC 
cell line), or may be obtained finom a subject^ (e.g. in a clinical trial of candidate 
pharmaceutical agents to treat HCC). Alterations in ejq>ression of one or more of 
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HCC nucleic acid markers (drug targets), in HCC cells or tissues tested before and 
after contact with a candidate pharmacological agent to treat HCQ indicate 
progression, regression, or stasis of the HCC th^eby indicating efficacy of candidate 
ageats and concomitant identification of candidate compounds for therapeutic use in 
5 HCC. Candidate compounds may also be screened for toxicity, specificity^ etc. 

When a crude extract is found to modulate expression levels of any of iho 
nucdeic a<dd molecules or pol^)^tides of fhc invention, fiirther firactionation of the 
I>ositiye lead extract is necessary to isolate chemical constitu^ts re^nsible for the 
observed effect Thus, &e goal of the extraction, fi:actionation, and purification 

10 process is the careful duuracteiization and identification of a diemicd entity within 
the crude extract having the modulatory activities. The same assays described herein 
for the detection of activities in mixtures of compoimds can be used to purify the 
active component and to test derivatives fliereof Methods of fi^actionation and 
purification of such heterogeneous extracts are known in the art. If desired, 

15 compounds shown to be usefiil ageats for treatment are chenucally modified 

according to methods known in tiie art Compounds identified as being of tiierapeutic, 
prophylactic, diagnostic, or o&er value maybe siibsequenfly analyzed using HCC cell 
lines or a animal model for HCC. 

20 Arravs. Microarravs, Libraries. Databases. And Kits 

In one aspect, the invention provides nucleic acid or polypeptide arrays and 
biological assays therepfl Arrays refer to ordered arrangements of at least two nucleic 
add molecules or polyp^tides on a substrate, which can be any rigid or semi-ri^d 
support to which two nucleic add molecules or polypeptides may be attached. In 

25 some embodiments, a substrate may be a liquid medium. Substrates include 
membranes, filters, chips, slides, wafers, fibers, beads, gels, capillaries, plates, 
polymers, and mioropatticles ^c. Because tiie nucleic acid molecules or polypeptides 
are located at specified locations on the substrate, the hybridization or binding 
patterns and intensities create a unique expression profile, which can be interpreted in 

30 terms of expression levels of particular genes and can be correlated witix HCC 
progression, regression, therapy, etc. 

High d^isity nucldc.add or polyp^tide arrays are also referred to as 
*^ax>arrays," and may for example be used to monitor the presence or level of 
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expression of a large number of genes or polypeptides or for detecting sequence 
variations, mutations and polymorphisms. Arrays and nricroarrays generally require a 
solid support (for example, nylon, glass, c^^mic, plastic, silica, aluminosilicates, 
borosilicates, metal oxides such as alumia and nickel oxide, various clays, 
5 nitrocellulose, etc) to which the nucleic acid molecules or polypeptides are attached 
in a specified 2-dimensional arrang^nent, sudhi that the patten of hybridization or 
binding to a probe is easily determini&le. hi some embodiments, at least one of the 
nucleic acid molecules or polyp^tides is a control, standard, or reference molecule, 
such as a housekeeping g^e or portion thereof (e,g., PBGD, GAPDH), fliat may 

10 assist in the normalization of expression levels or assist in the determining of nucleic 
add quality and binding characteristics; reagent quality and edBFectiveness; 
hybridization success; analysis thresholds and success, etc. 

Nucleic acid molecules or polyp^tide probes may be derived from 
conq>ounds as described herein for example in Tables 1 through 4, and the 

1 5 compositions of the invention may be used as elements on a microarray to analyze 

gene Gxprcssion profiles. For the puipose of such arrays, **nucleic acids" may include 
any polymer or oligomo: of nucleosides or nucleotides (polynucleotides or 
oligonucleotides), ivfaich incdude pyrimidine and purine bases, prefCTably cytosine, 
thymine, and uracil, and ad^iine and guanine, respectively. A variety of methods are 

20 known for making and using mianoarrays, as for exan^le disclosed in Cheung, V.G., 
et al- 1999; Lipshutz, R. J., et al. 1999; Bowtell, D.DX., 1999; and, Schweitzer, B., 
2002; G. MacBeafli and S. L. Schreiba:, 2000.; all of which are incorporated h«:dn 
by reference. In some embodiments, the microarray substrate may be coated with a 
compound to enhance synthesis of the nucMc acid molecule on flie substrate as 

25 disdosed in, for example, U.S* Pat No. 4,458,066. In some embodiments, probes 

may be synthesized directiy on the substrate in a predetermined ordered arrangCTnenL 
Methods for storing^ querying and analyzing mioroarray data have for example heea 
disclosed in, for example. United States Patent No, 6,484,183; United States Patoit 
No. 6,188,783; and HoUoway, AJ., 2002; each of whic^ is inooiporated haeinby 

30 reference. In an alternative aspect, the inventioni provides nucleic add or polypeptide 
microairays including a number of distinct and selected nucleic add or polypeptide 
airay sequences of tiie invmtion. The numba: of distinct sequences may for example 
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be any integer between 2 and 1x10^, such as at least 10^, 10^, lO"*, or 10^. The size of 
the distinct sequences may vary depending on the intended use, and can be 
determined by a skilled person. For example, the nucleic acid sequences may range 
from 1 5 to 5000 bases or more, or any integer between this range, 

Microarrays may also be used to examine the expression of all the genes in a 
tissue or cell such as a liver cell or a HCC cell. Thus, ftte nucleic add molecules of 
Tables 1 fluroug^ 4 may be attadied to a solid support, hybridized wiftt single stranded 
detectably-labeled cDNAs (corresponding to a "complementary" oriratation), and 
quantified using an ^propriate mediod sudi that a signal is detected at each location 
at whidi hybridization has taken place. The intensity of the signal would ihea reflect 
the amount of gene expression. Similarly, protein miax>arrays may be used according 
to methods known in the art Comparison of results fix>m dififermt cells or tissue, for 
example, hepatocellular carcinoma cells or tissue, hepatitis virus infected cells or 
tissue, non-tumor cells or tissue, normal cells or tissue, cirrhotic liver cells or tissue, 
or any combination thereof would elucidate differing levels of expression of specified 
genes fiiom the different sources. 

hi one aspect of the invention, libraries may be constracted of bacterial strains 
each of whidi bears a plasmid repressing a different nucleic acdd moleca;de of any one 
or more of Tables 1 tibrou^ 4 under control of an inducible promote. ORFs are 
amplified using PGR and cloned into a vector that cables their expressicm as N- 
terminal his-tagged polypq>tides. These amplicons are also used to constmct 
hybridization microarrays and enable targeted gene disnqption, reducing expenses. A 
suitable expre^on host (e.g. E. coli) is selected, and genes mcoding particular 
biochemic^ activities are identified by screening arrayed pools of his-tagged proteins 
as described previously (Martzen, M.R., et al,, 1999). 

The invention also provides databases including the nucleic acid and 
polypeptide sequences described herein, as well as gene expression information in 
various cancerous and non-canc»ous liver and liver cell line samples. Sudi databases 
may be used to access information that may aid in diagnosis, prognosis, or other 
HCC-related methods of tiie invmtion. A database as used herein includes any 
electronic form of the compounds (e.g., nucleic adLd and polypq|>tide sequmces) of 
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the inventioa, and information regarding these compounds, and includes computer 
readable media and any suitable form for storing the information. 

Hie invention also provides kits including for example one or more of the 
nucleic acid molecules or polypeptides of the invention (or complements, analogs, 
variants, or fragments thereof), an appropriate buffer, appropriate reagents for 
detection, and appropriate controls. For example, a kit may include probes or primers 
(which may or may not be detectably labeled) suitable for hybridization or 
anq>lification, or may include antibodies or ligands suitable for spedfic binding. A 
kit may also indude written or electronic instructions. 

Diagnostic and Other Uses 

A wide variety of detectable labels and conjugation techniques are known by 
those skilled in the art and may be used in various nucleic acid molecule and 
polypeptide assays to diagnose HCC. The nucleic acid molecules, proteins, 
antibodies and other compounds according to the invention may be labeled for 
pmpos^ of assay by joining them, either covaleatly or noncovaleotly, with a 
d^ectable label. By **detectably labeled'* is meant any means for maiking and 
identifying the presmce of a molecule, e.g., an oligonucleotide probe or primer, a 
gene or fiagm^t thereof a cDNA niolecule, or a polypeptide. Methods for 
detectably4d>eling a molecule are well known in the art and include, without 
limitation, radioactive labeling (eg., witii an isotope sudi as ^^P or ^^S) and 
nonradioactive labelling such as, enzymatic labeling (for example, using horseradish 
pm>xidase or alkaline phosphatase), chemiluminesc^t labeling, fluoresc^t labeling 
(for example, using fluorescein), biolimiinesc^ labeling, or antibody detection of a 
ligand attached to the probe. Also included in ibis definition is a molecule that is 
detectably labeled by an indirect means, for example, a molecule that is bound with a 
first moiety (such as biotin) that is, in turn, bound to a second moiety that may be 
observed or assayed (sudi as fluorescein-Iabeled streptavidin). Labels also include 
digQxigCTin, ludferases, and aequorin. Syndesis of labeled molecules p^ormed by 
using labels sack as ^^P-dCTP, Cy3-dCTP or ClyS-dGTP or ^^S-^methiomne. 
Compounds according to the invmtion may also be direcdy labeled by chemical 
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conjugation to amines, thiols and other groups present in the molecules using reagents 
such as BIODBPY or FITC (Molecular Probes, Eugene, OR, USA), 

Compounds, compositions, and reagents according to the invention may be 
used to detect and quantify differential gene expression; absence, presence, or excess 
expression of nucleic acid molecules (e.g., mRNAs) or polypeptides; or to monitor 
nucleic acid molecule (e.g.y mRNA) or polypeptide levels during therapeutic 
intervention in subjects with HCC. The compounds, compositions, and reag^ts 
according to die invention can also be utilized as markers of HCC treatment efficacy 
over a period ranging firom days to months to years. The diagnostic assays may use 
hybridization, amplification, ligand binding, or antibody technologies to compare 
gene expression in a biological sample from a subject to refer^ce samples or 
standards, or to canc^x>us and non-canc^ous samples from the subject, in order to 
detect alt^ed gene expression. Qualitative or quantitative methods for this 
comparison are known in the art, and any suitable method may be used. 

hi order to provide a basis for the diagnosis of HCC, a non-tumor or standard 
gene expression profile may be established. This may be accomplished by combining 
a biological sample takoi from normal or non-tumor subjects or from non-cancerous 
tissue from a subject with HCC, with a probe und^ conditions for hybridization or 
anq>lificatioiL Standaxd hybridization may be quantified by comparing the values 
obtained udng non^tuinor subjects or non-cancerous tissue with values from an 
expoiment in whidi a known amount of a substantially purified target sequence is 
used. Standard values obtained in this maim« may be compared with values obtained 
from san^les fixnn patients who are symptomatic for HCC. Devi^on from standard 
\^ues tow£utd those associated wi& HCC is used to diagnose HCC. Sudi assays may 
also be used to monitor the efficacy of a particular HCC therapy in animal studies, in 
dinical trials, or to monitor the treatment of an individual patient or groups of 
patients. Once the presence of HCC is established in a subject and a treatment 
protocol is initiated, assays according to the invention may be repeated on a regular 
basis to detennine if the level of expression in the subject begins tx> approximate that 
wfaidi is obs^ved in a non-tumor subject, and to monitor the progression of HCC in 
the subject The results obtained from successive assays may be used to show tiie 
efficacy of treatmCTt over a pCTod ranging from several days to months. 
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Compounds, compositions, and reagents (e.g., microanays) according to the 
invention may be used to monitor the progression or regression of HCC, The 
dififermces in gene expression between healthy and diseased tissues or cells can be 
assessed and cataloged. By analyzing dianges in patterns of gene expression, HCC 
can be diagnosed at earlier stages before the subject is symptomatic. Similarly, by 
analyzing gene expression profiles and changes therein, prognoses may be 
formulation, and dierapies may be designed. Progression or regression of HCC may 
be determined by comparison of two or more different HCC samples taken at multiple 
different times fix>m a subject (e.g., at least 2, 3, 4, or 5 or more time points) over the 
course of days to months. For example, progression or regression may be evaluated 
by assessments of expression of sets of two or more, or as many as all, of the nucleic 
add molecules of Tables 1 through 4 in a HCC tissue sample from a subject before, 
during, and following treatment for HCC. 

Compounds, compositions, and reagents (e.g., microarrays) according to the 
invention can also be used to monitor the efficacy of a thoi^y. For therapies with 
known side efifects, compounds, compositions, and reagmts(e.g., mioroarrays) 
according to the invention may be CTiployed to improve the tfaerq>eutic regimen. For 
example, dosages that causes changes in g»e profiles that re{>resent efficacious 
treabtkcat may be determined, and egression profiles associated with the onset of 
undesirable side effects may be avoided. Tliis £q)proach may be more s^isitive and 
tiq[>id than waiting for the subject to show inadequate improvCTioit, or to manifest 
side effects, before altering the course of treatment, hi another aspect of the invention, 
pre- and post-treatment alterations in expression of two or more sets of HCC nucleic 
add molecules in HCC cells or tissues may be used to assess treatment paramet^ 
including, but not limited to: dosage, method of adncdnistration, timing of 
administration, and combination with other known treatmCT^ts for HCC. 

In some aspects, any one or more of the compounds provided herein may be 
iQsed in therq>eutic ^plications. For example, selected compounds provided herein 
maybe used as &a::^utic targets for theidentificationofagfflts, that modulate their 
eispression l^els and/or activity, that may be used to treat HCC. 
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EXAMPLES 

Experimental Procedures 

UNA isolation, RNA amplification and cDNA microarray hybridization 

Paired samples of tumor and corresponding non-tumor tissues were obtained 
5 from resected liver specimens from thirty-seven (37) patients who had been diagnosed 
with hqpatitis B virus (HB V)-associated HCC and had undergone curative liver 
resection. A validation tissue set composed of 58 liver biopsy samples from an 
indepmdent cohort of tw«ity-nine (29) pati^ts, who also had HBV-associated HCC 
and had und^gone liver resection, was used. Liformed consent from the patient and 

10 institutional leseardi and ^hics committee approval were obtained. Tissues were ss%a^ 
ifrozm in liquid nitrogen and stored at -ISO^C. A small section of each specimen was 
sampled and total RNA was isolated from tissues using TRIZOL® reagent (Life 
Technologies, Bethesda, MD, USA) according to the manufacturer's instructions. The 
integrity of the RNA q>ecimen was verified by gel electrophoresis. 

15 The human liver cancer cell lines used in this study were: PLCVPRF/S, 

HA22T,Huhl, Huh4, Tong, Hep3B, SNU182, SNU449, SNU475, HepG2, 
Huh6, Huh?, SKH^l, and Mahlavu. All cejl lines w»e cultured under 
conditions recommended by the A^zi^c^ Type Culture Collection (VA, USA). 
Total RNA was extracted using TEOZOL® reag^t (Life Technologies, Bethesda, MD, 

20 USA) according to the manufacturer's instmciions. 

RNA was linearly amplified using a procedure modified from Eberwineand 
coworkers (Eberwine et al, 1992). Briefly, total RNA was reverse transoibed using a 
63-nucleotide synthetic prima- containing the T7 polym^ase binding site 5*- 
GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(T)24-3'. Full- 

25 length double-stranded cDNA synthesis was accomplished in the presence of £1 cdli 
DNA polymerase I, DNA ligase and RNase H. The cDNA was made blunt-ended 
with T4 DNA polymerase, and pmified by extraction in a mixture of phenol, 
chloiofoim and isoamyl alcohol, and precipitation in the pres^ce of ammonium 
acetate and ethanoL Purified double-stranded cDNA was then transcribed wifh T7 

30 polymerase (T7 Me^oipt® Kit, Ambion) to yield linearly amplified antisense RNA, 
which was subsequently purified with RNeasy® mini-columns (Qiag^). Human 
universal refer^ce RNA (Stratag^e, La Jolla, CA)» including total RNA fix>m 10 
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different human cell lines^ was amplified and used as the reference for cDNA 
. microarray analysis. 

Approximately 9000 himian cDNA features (Incyte Genomics, Palo Alto, CA, 
USA) were spotted onto poly-L-lysine coated slides using OnmiGiid® arrayer 
(GeneMachines). Probes were generated from the amplified RNA material and 
hybridized to the chip as described elsewh^e (Sotiriou et al, 2002). Briefly, 4 |xg 
amplified RNA was rev^e-transcribed using random hexamers and diiecdy 
labeled with Cy3-conjugated dUTP (reference RNA) or Cy5-conjugated dUTP 
(sample RNA). Hybridization was performed in the presmce of 25% formamide 
and 5X SSC for 1 6h at 42^C. Slides w»e scanned with an Axon 4000b lasw scanner 
(Axon instruments) after washing and drying. To minimize the effects of labeling 
biases, reciprocal dye swap labeling experiments were p«fbrmed for each sample. 

Data analysis 

Tlie 37 paired HCC tumor and non-tumor Uva- samples, and liver cancer 
cell lines Were processed on flic rmcroairay on two s^arate prints, and the validation 
tissue set was processed on a third print. Raw data was analyzed on GmePix analysis 
software version 3.0 (Axon Instruments^ Burlingame, CA, USA) and xq>loaded to a 
relational database maintained by the Center for Information Technology at the 
National Institutes of Health (ie. MADB). The cDNA clones used for flie microarray 
are repres^ted by flieir UniGene identifiers. For each array, the 
logarithmic expression ratio for a spot on each array was normalized by subtracting 
tibe median logarithmic ratio for the same array. Data was filtered to exclude spots 
with a size of less than 25 pm and any poor quality or nussing :^ts. Since flie 
correlation of the bvorall data from reciprocal labeling was good, values obtained 
from reciprocal labeling expmmrats were averaged. In addition, any gene features 
that were found to be abs^t from the data in more than 50% of patient samples in 
eidier set of arrays were excluded, and gene features that wesc& common in data from 
the anay print sets were retained, ^plication of these filters resulted in the inclusion 
of 8716 of the total 9127 features in subsequent analysis. Statistical comparison of 
gCTies betwe^ HCC tumors and non-tumors was performed by fiie Wilcoxon rank- 
sum non-parametric test To evaluate g^e expression patterns, hierarchical clust^iiig 
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using one minus Pearson's correlation m^c and average linkage (Eisen et al, 1998) 
and mxiltidiinensional scaling was performed on normalized data (mean equals zero, 
standard deviation equals one). Functional characterization of genes was based on 
Gene Ontology (The Gene Ontology Consortium, 2000) and other published works 
5 known to those of ordinary skill in the art 

The quality of a set of selected goae features to be used as potential 
markers was measured by estimating the probability that its observed performance, 
in terms of nimiber of misclassified tissue samples, could occur by chance alone. This 
was achieved by performing a scties of Monte Cado simulations (Davison and 

10 Hinkley, 1997) iqx>n the egression data of the selected g^es. In each simulation, the 
tissues' labels were randomly permuted and the mmiber of misclassifications was 
noted* A total of one nuMion runs of Monte Carlo simulations were performed. The 
reported P-value (denoted as Pa) is the firaction of times the permutations generated as 
few misclassifications as, or fewer than, the original labeling. To determine whether 

15 die s^ of genes observed to have a good performance as tumor discriminators, could 
^pear mo-ely by chance, different Monte Carlo simulations were carried out. In 
eadi simulation, an equivalent number of gene features was randomly picked from a 
designated large population of features, and the pCTformance of the random gene set 
was evaluated by the number of tissue samples that wore misclassified. A total of 

20 10^000 runs of Monte Carlo simulations were performed for each evaluation. Hie P- 
value (Pb) is the fraction of times the random gene set performed as good as» or better 
than, the performance of the selected gene set 

Hie sigqifieance of the nunib^ of observed overls^ing g^es after 
intersection of die important g^e lists, doived as desoibed hoetn, with g^e liste 

25 reported previously was approximated by measuring the probabilities that such 

overlap could occur by chance alone. A separate series of Monte Carlo simulations 
were employed to estimate the P-values of the two-group comparisons. In each 
simulation, two lists of genes corre^onding to the two groups were genoated. Each 
list was constmcted by randomly selecting g^es, as many as the number of gmes in 

30 its corresponding gix>iqp, from the ^tire coUe^ 

respective mio-oarray jgene set. The two random gene lists were then int^ected. 
Hie P-value (Pc) of the comparison was obtained by generating and intersecting the 
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two random lists 100 million times, and reported as the fraction of times the random 
overlap is equal or greater than the observed one. To validate the utility of the various 
expression cassettes to distinguish HCC tumor from non-tumor liver, the prediction 
accuracy of each discriminator cassette was assessed on an independent tissue set 
comprising of 58 liver clinical biopsies from 29 patients using a Ar-Nearest Neighbor 
(^JW) classification algorithm (A=3) using Pearson correlation to measure the 
similarity between expression profiles. The algorithm was trained agamst the dataset 
comprising 74 tissue samples fix>m 37 pati^ts before testing against the new tissue 
set 

Real-time semi-quantitative RT'PCR 

Total RNA from individual tissue samples w«re analyzed for the expression 
levels of selected genes by real-time semi-quantitative RT-PCR using tiie LightCyclo- 
RNA amplification kit SYBR Green I on the LightCycIer (Roche, Basel, Switzerland) 
according to flie manu&cturer's instractions. Briefly, one-st^ RT-PCR reactions 
consisted of an initial incubation at SS^C for 10 nwn, followed by a denaturation step 
at 95°C for 30 s, and amplification for 40 cycles of 1 s at 95^C, 10 s at 57^C, and 13 s 
at IT'C. For each reaction, 10 ng of total RNA was analyzed. The gene specific 
primes designed were, for exan^le, as follows: IGFBP3 5'- 

ATAATCATCATCAAGAAAGGGCAT-3' and S^GAAGGGCGACACTGCTT- 3'; 
EGFR 5*-GCGTCTCTTGCX::GGAATG-3' and 5'^GCTCAGCCTCCAGAAGCTT- 
3'; ERBB2 5*-GGATGTGCGGCTCGTACAG-3' and 5'- 
TAATrTTGACATGGTrGGKMCTClT- 3*; ERBB3 5*. 
CGGtTATGTCATGCCAGATACAC-3' and 5'- 
ACAGAACTGAGACCCACTGAAGAA-3'; PBGD 5'- 

GAGTGATTCGCGTGGGTACC- 3' and 5'-GGCTCCGATGGTGAAGCC-3\ The 
relative expression level of each gene of interest in individual tissue sample 
was normalized against that of the *liousekeeping'' gene PBGD. Data are presented as 
the level of gene expression in each HCC tumor relative to its corresponding non- 
tumor liver spe<mnen. 
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Assessment Of Global Gene Expression Differences Betweati HCC Tumors And 
Non-Tumor Liver Specimens 

The gene expression patterns of primary HCC tumors and the 
corresponding non-tumor liver tissues from 37 patients were examined by cDNA 
microarray. Amplified RNA prepared from each experimental sample was labeled 
with Cy5 and hybridized on the array with pooled human ^common 
reference' amplified RNA labeled with Cy3. Reciprocal dye sw^ replicate 
hybridizations were performed to minimize tedmical noise. Since the overall 
correlation of reciprocal labeling was good, values obtained firom reciprocal 
labeling e?q>aimCTts were ava:aged and used in subsequent analyses. Firstly, the 
ov«:all natural patterns of gene expression in the HCC tumor and non-tumor liver 
tissues were assessed based on unsupervised hierarchical clustering. Analysis of 
variance in expression levels for each gene across all the tissues indicated that 500 
gene features (containing 493 unique UniGenes) showed the largest variability across 
both HCC tumor and non-tumor liver tissues (Figure 1). Included in this list are AFP, 
an often used prognostic marker for HCC, and oHier genes associated with HCC such 
as HGF, MYC, and a ras family member RAN. Hierardiical clustering analysis based 
on th^e highly variant genes (derived from the 37 pairs of HCC tumor and non-tumor 
liver samples and using the 500 most variable gene features) separated the tissues 
into two main clusters, one representing the HCC tumors and the other, the non-tumor 
liv» tissues with only six of 37 HCC tumors misclassified as non-tumors. Thus, the 
molecular configuration of HCC can be readily distinguished from that of non-tumor 
liver widi minimal data manipulation. 

Next, to investigate dififerential g^e expression patterns betwem HCC tumors 
and non-tumor livers, the Wilcoxon rank-sxmi test was xised and the top 2.5% 
candidate g^es which displayed the smallest (best) P-value scores (P<lxlO"^ and at 
least 1.5-fold change in gene expression were identified, resulting in a list of 218 
genes (Table 1). For these 21 8 genes, false discovery rate analysis indicated a Mse- 
positive error of less &an 0.4%. Multidimensional scaling analysis based on these 
outlier gcaes indicated that the HCC tumors were a more heterogeneous population 
than the non-tumor liver tissues (Figure 2). 
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CancCT cell lines derived fix>m the primary tumor have traditionally been used 
as in vitro model systems for investigating the function of genes in the in vivo tumor 
OTviroranent Using the 218 differentially expressed outlier genes id^tified in the 
clinical samples, the expression patten of the same genes in 14 established human 
livCT cancer cell lines was analysed. These cell lines exhibited gene expression 
profiles that wctc differ^t fi-om the clinical HCC tumor tissues (Figure 2), suggesting 
that they may have accumulated additional genetic or epigen^c alterations in culture 
and are not ^tirely reflective of the primary tumor biology. 

Identification Of Gene Clusters Differentiallv Expressed In HCC TrnnoT^ 

Among the statistically significant 218 genes that distinguished HCC 
tumors fix>m non-tumor liver tissue specimens, more genes wwe obswved to be 
overexpressed than imder-expressed in the malignant tissue specimens relative to the 
non-tumor tissue specimens. Mapping of the chromosomal location of these 218 
unique outliCT genes indicated that a disproportionate number of genes was located on 
chromosome 1 (Figure 3 A), particularly in the Iq region, and that majority of these 
genes were more hi^y expressed in the tumor tissues. Further c^aractoization of 
these outli^ genes revealed that a substantial proportion of genes was involved in 
transport (e.g:, PEA15), RNA processing RDBP), and metabolic processes (e.g,^ 
NMEl) and showed inoreased expression in HCC tumor spedmens, possibly 
indicating accelerated rates of metabolism (Figure 3B, Table 1), Several outlier genes 
{e.g.y SMT3Hi) are memb^s of the-nbiquitin-proteasome pathway, suggesting 
•da:egulation of this pathway in HCC, A gene dustw associated with lymphocyte 
infiltrate fliat included flie expression of gmes sudi as IGKC and IGJ was 
observed, and transcription factors (e.^., ESRl) and genes involved in controlling 
growth and diff^entiation (e.^., GRN) , and signal transduction (e.g:, CSTB) formed 
the other dominant gene groups. Notably, the polycomb group protein BMIl was 
consistmtly expressed at much higho: levels in HCC tumor spedmens (Figure 4). 
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Table 1. Genes significantly differentially expressed betiveen HCC tumor and 



non-tumor liver tissues. 



Function 


Gene 
Symbol 


Gene Name 


UniGene 
Identifier 


Expres- 
sion 
change 
in HOC 
tumor* 


GenBank 
No. 


transcription 
factors 


ILF2 


interleukin enhancer 
binding factor 2, 45kD 


Hs.75117 


? 


AA307289 




BMI1 


murine leukemia viral (bmt- 
1) oncogene homoiog 


Hs.431 


? 


AA884913 




TAF9 


TAF9 RNA polymerase II. 
TATA box binding protein 
{TBP)-associated factor. 32 
kD 


l-ls.60679 


? 


U21858 




RFX5 


regulatory factor X. 6 
(Influences Hl^ class 11 
expression) 


Hs.1 66891 


7 


AL050135 






structure specific recognition 
protein 1 


Hs.79162 


? 


AI635077 




ZNF146 


zinc finger protein 146 


Hs.301819 


? 


X70394 






sterol regulatory element 
binding transcription factor 2 


Hs.108689 


7 


AA608556 




MAFG 


v-maf musculoaponeurotic 
fitH-osarcoma (avian) 
oncogene family, protein G 


Hs.252229 


? 


AF059195 




CHD4 


chromodomain heiicase 
DMA binding protein 4 


Hs.74441 


7 


BE408958 






nuclear receptor subfamily 
4, group A. member 1 


Hs.1 119 


? 


MM 00213 

5 




ESR1 


estrogen receptor 1 


Hs.1 657 


7 


AL078582 




ZNF238 


zinc finger protein 238 


Hs^69997 


? 


AJ223321 




FOSB 


FBJ murine osteosarcoma 
viral oncogene homolog B 


Hs.75678 


? 


L49169 




IDI 


inhibitor of DNA binding 1 , 
dominant negative helbc- 
loop-helbc protein 


Hs.75424 


7 

- 


S78825 




FOS 


v-fos FBJ niurine 
osteosarcoma viral 
oncogene homolog 


Hs.25647 


7 


V01512 














RNA 

pfocessinq 


H2AFY 


H2A hfstone family, 
member Y 


Hs.75258 


7 


AA307460 




SNRPB 


small nuclear 
ribonucleoprotein 
pdypeptkjes B and B1 


Hs,83753 


? 


aE252108 




RPS7 


ribosomal protein S7 


Hs.301547 


? 


AA315872 




MFIPS14 


mitochondrial ribosomal 
protein S14 


Hs,247324 


? 


AW973521 




HNI^U 


heterogeneous nuclear 
ribonucleoprotein U 
(scaffold attachment factor 
A) 


Hs.103804 


7 


X65488 




SNRPD2 


small nuclear 


Hs.53125 


? 


AA315774 
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noonuci6opri/iGin l/^ 










NCL 


nucfeolin 


Hs.79110 


? 


AK000250 






nDosoiuoi' proiGin o lu 


HS.7o230 


/ 


AW245775 




RPL6 


ribosomal protein L6 


Hs.349961 


? 


AW675430 




SFPQ 


splicing factor 
proline/glutambie rich 

\^p<jiypyr iiiiKJine iiociHjiriciing 
protein-associated) 


Hs.1 80610 




X70944 




OIM1 


similar to S. pombe dimln- 


Hs.5074 


? 


AI814618 




MARS 


m ethionine-tRNA 
synthetase 


ris.279946 




BE299937 




SFRS9 


splfcing factor, 
arginine/serine-iich 9 


Hs.77608 


? 


ALG21546 




RBM3 


RNA binding motif protein 3 


HS.3Q1404 


? 


NM 00674 

3 




U2AF65 


U2 small nudear 
ribonucleoprotein auxiliary 
factor (65kD) 


Hs-7655 


? 


AA936430 




SFRS1 


splicing factor, 
arginine/serine-rich 1 
(splicing factor 2, alternate 
splicing factor) 


Hs.73737 


? 


M72709 




SNRPE 


small nuclear 
rit>onucleoproteih 
polypeptide E 


Hs.334612 


? 


XI 2466 




SF3B4 


splicing factor 3b, subunit 4, 
49kD 


riS.ZD797 


/ 


NM 00585 
0 






r\u rviMM-Dinaing pfoieni 


,ris. lUoUb 1 




XI0IO5 






small nuclear 
n Donudeoproiem 


Hs.105465 


? 


AA649986 




RRM1 


ribonudeottde reductase Ml 

r>ol \/n^ntfHp» 
l-'V'i y i^oi^uvjo 


Hs^34 


? 


X59S43 




RR.38 




Hs 201 7 


O 

s 






HNRPHl 


heterogeneous nuclear 
ribonucleoprotein HI (H) 


Hs^45710 


? 


BE296051 




US- 
116KD 


oi ir\iNr^""opcmiiL» prv>i6iii, 

116 kD 


nS. ID 1 / Off 








RPLP1 


ribosomal prot^ large. PI 


Hs.177592 


? 


A\N963733 




OXA1L 


oxiaase (cyiocnrome c) 
assembly 1-like 


nS.1o1134 


? 


X80695 














DMA 

rGpiicauon/ 




ADP-ribosyltransferase 
(NAD+; poly (AuP-nbose) 
polymerase/ 


Hs.177766 


? 


M18112 




PRKDC 


protein kinase. DNA- 
activated. catatvUc 
polypep^e 


Hs.155637 


? 


U34994 




SMC4L1 


SMC4 (structural 
maintenance of 
chromosomes 4, yeastHike 
1 


Hs.50758 


? 


AB019987 




H2AV 


histone H2A.F/Z variant 


Hs.301dOS 


? 


BE409809 






flap structure-spedfte 
endonudease 1 


Hs.4756 


? 


BE278623 
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MCM2 


minichromosome 

maintenance deficient IS 
cerevisiae) 2 (mitotin) 


Hs.57101 


? 


BE250461 




HAT1 






•7 






RAD50 


homolog 


Hs 41<)a7 




77*\^'l*1 

^rooii 




CBX1 


chromobox homolog 1 (HP1 
beta homolog Drosophila ) 


Hs,77254 


? 


AL046741 




CSPG6 


chondroitin sulfate 


Hs,24485 


? 


NMJ00544 

O 




FUS 


fusion, derived from t(12;16) 
malignant liposarcoma 


Hs.99969 


? 


BE396632 








ns.oUH 1 


















ceil cyci6/ 
growth/ 

Ull lol cl lUa*' 

tion 


GPC3 


glypican3 


Ms.naooi 


7 


U50410 




CDKN2A 


QyciirtKjepenaeni Kinase 
inhibitor 2A (melanoma, 
p16. inhibits CDK4> 


ns.i 1 / *v 


# 


Aloo9o22 




MDK 


miQivine ^neunie growm- 
promoting factor 2) 


liS.oZ04o 


7 


AA427949 




NTRK1 


neurouopmc lyrosine 
kinase, leceptor. type 1 




/ 


AA0751 1 0 




CCNE2 


cydin E2 


ilS.oU*K>*f 


o 


NlviJl>U47u 




HDGF 


h<^n^tomA-H<>riv^rl rrmwth 

factor (highHnobility group 
protein 1 -like) 










TP53BP2 


tumor protein p534>inding 
protein, 2 


Hs.44585 


? 


Ai123916 




CDC23 


CDC23 (cell division cq^le 
23, yeast, homolog) 


Hs.1 53546 


? 


AF053977 




GRN 


oranulin 


Hs 180577 


7 






GHR 


arowth hormone receotor 


Hs 125180 








tGFBP3 


insuluvlike Growth factor 
binding protein 3 


Hs 77326 


? 


BE336g44 




GYR61 


<^teineHlch, angiogenic 
inducer. 61 


Hs.8867 


? 


Y12084 




HGF 


hepatocyte growth factor 
(hepapoietin A; scatter 
factor) 


Hs.809 


7 


X16323 














apoptosis 


DAP3 


death associated protein 3 


Hs.1 59627 


? 


AA207194 




PDCD5 


Droorammed cell death 5 


Hs 16646B 


7 


AA452724 


immune 
response 


PPIA 


peptidytprolyl isomerase A 
(cydophilin A) 


Hs.342389 


? 


AW732921 




TMPO 


thymopoiefin 


Hs.11355 


? 


U09087 




PPIB 


peptidytprolyl isomerase B . 
(cydbphBIn B) 


Hs.699 


? 


BE386706 




C05L 


CDS antigeh-iike (scavenger 
receptor cysteine rich 
family) 


Hs.52002 


. ? 


NM 00S89 
4 




SCYA14 


small indudble cytokine 
subfamily A (Cys-Cys). 


Hs^144 


? 


NM 00416 
6 
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member 14 










SDF1 


stromal cell-derived factor 1 


Hs.237356 


? 


L36033 




IGHG3 


Immunoglobulin heavy 
constant gamma 3 (G3m 
marker) 


Hs.300697 


? 


D78345 




C7 


complement component 7 


Hs.78065 


? 


X86328 




IGJ 


immunoglobulin J 
polypeptide, linker protein 
for immunoglobulin alpha 
and mu polypeptides 


Hs.76325 


? 


AW1 72754 






immunoglobulin kappa 
constant 


Hs.156110 


? 


AW404507 


ceil 

adhesion/ 

cytoskeletai 

organization 


LBR 


lamin B receptor 


Hs.1 52931 


? 


L2S931 




1 1 \7D 1 


integnn, beta 1 (filwonectin 
receptor, beta polypeptide, 
antigen CD29 includes 
MDF2. MSK12) 


Hs.287797 


? 


W38716 




LAMR1 


iaminin receptor 1 (67kD, 
ribosomal protein SA) 


Hs.1 81 357 


? 


AW328280 




C/VPZA2 


capping protein (actin 
filament) muscle Z-line, 
alpha 2 


Hs.75546 


? 


U03851 




ICAP-1A 


Integrin cytoplasmic domain- 
associated protein 1 


Hs.1 73274 


? 


AF012023 




DNCH1 


dynein, cytoplasmic, heavy 
polypeptide 1 


Hs.7720 


? 


AB002323 




ARPC1A 


actin related protein 2/3 
complex, subunit 1 A (41 kD) 


Hs,90370 


? 


Y08999 




DPT 


dermatoponfin 


Hs.80552 


? 


AW016451 




MMP15 


matrix metafloprotelnase 15 
(membrane-inserted) 


Hs.80343 


? 


D85510 




ARHE 


ras homolog gene family, 
member E 


Hs.6838 


? 


W03441 


signal 

transduction 


CAP2 


adenylyl cydase-associated 
protein 2 


Hs.296341 


? 


AW77999S 




CSTB 


cystatin B (stefin B) 


Hs.695 


? 


Ai831499 




ARMET 


arglnineHTich, mutated in 
early stage tumors 


Hs.75412 


? 


AA582041 




EFNA1 


^nhrin-A1 

H II 1 J \ 1 


Hs.1624 


? 


NM 00442 

8. 




PPP2R5A 


protein phosphatase 2, 
regulatory subunit B (B56), 
alpha Isoform 


Hs.1 55079 


? 


AA234460 




RAN 


RAN. member R/^ 
oncogene famly 


Hs.10842 


? 


NM_00632 
5 




CALM2 


calmodulin 2 
(phosphorylase kinase, 
delta) 


Hs.182278 


? 


D45887 




LASP1 


UM and SH3 prot^ 1 


Hs.334851 


? 


AI304S06 




SHC1 


SHC (Src homology 2 
domairircontainlng) 
transforming protein 1 


Hs.81972 


? 


X68148 




RGS5 


regulator of G-protein 


Hs.24950 


? 


AI674877 
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siqnalling 5 










HAX1 


HS1 binding protein 


Hs.1531o 








GABRE 


gamma-aminobutyric acid 
(GABA) A receptor, epsilon 


Hs.22785 


? 


NM 00496 
1 




ARFGEF 
2 


ADP-ribosylation factor 
guanine nudeotide- 
exchange factor 2 (brefeldin 
A'inhibited) 


Hs,118249 


o 

f 


AACnlsloo^ 




MAPK6 


mitogen-activated protein 
kinase 6 


Hs.271980 


? 


NMJ00274 

o 
o 




GNB2L1 


guanine nucleotide binding 
protein (G protein), beta 
polypeptide 2-like 1 


Hs.5662 


? 


BE206815 




ERBB3 


v-erb-b2 avian erythroblastic 
leukemia viral oncogene 
homolog 3 


Hs.199067 


? 


AI565773 




DSCR1 


Down syndrome critical 
region gene 1 


Hs.184222 


? 


U85267 




CRHBP 


corticotropin releasing 
hormone-binding protein 


Hs.115617 


? 


NM 00188 
2 




STK39 


serine threonine kinase 39 
(STE20/SPS1 homofog. 
yeast) 


Hs.199263 


? 


F26137 


ubiquitin- 

proteasome 

pathway 


UBD 


diubiquitln 


Hs.44532 


? 


NM 00639 
8 




PSMB4 


proteasome (prosome, 
macropatn) subunit, beta 
type, 4 


Hs-89545 


? 


dc33od37 




SSA2 


Sjogren syndrome antigen 
A2 (60kD, n*bonucleoprotein 
autoantigen SS-A/Rb) 


Hs-554 


? 


NM 004dU 
0 




USP14 


ubiquitin specific protease 
14 (tRNA-guanine 
transglycosylase) 


HS.759o1 


o 

r 


■ IMM UUO 1 0 

1 




PSMA1 


proteasome (prosome, 
macropain) subunit. alpha 
type. 1 


Hs.82159 


? 


AI889267 




EIF3S9 


eukaryottc translation 
initiation factor 3, subunit 9 
(eta. 116kD) 


Hs.57783 


? 


U62583 




SMT3H1 


SMT3 (suppressor of mif 
two 3, yeast) homolog 1 


Hs.85119 


? 


AA1 60893 




UBE2D2 


ubiquitin-conjugating 
enzyme E2D 2 (homok>gous 
to yeast UBC4/5) 


Hs.108332 


o 

€ 


INIVl 

9 






proteasome (prosome, 

msir^onAin^ subunit 

non^ATPase, 11 


Hs.90744 


o 

f 






PSMB3 


proteasome (prosome. 
macropain) subunit, beta 
type, 3 


HS.82793 


? 


AI028114 




PSMD4 


proteasome (prosome, 
macropain) 26S subunit. 
non-ATPase, 4 


Hs.148495 


? 




m<rfecular 


CCT5 


chaperonin containing 


Hs.1600 


? 


D43950 
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chaperone 




TCP1. subunit 6 (epsilon) 












chaperonin containing 
TCP1, subunit 3 (gamma) 


Hs.1708 


? 


BE302501 




HSPA5 


heat shock 70kD protein 5 
(glucose-regulated protein, 
78kD) 


Hs.75410 


? 


AL043206 




CCT4 


chaperonin containing 
TCP1, subunit 4 (delta) 


Hs.79150 


? 


U38846 




HSPA4 


heat shock 70kD protein 4 


Hs.90093 


? 


AB023420 




CX:T7 


chaperonin containing 
TCP1, subunit 7 (eta) 


Hs.1 08809 


? 


AA314436 




HSPA8 


heat shock 70kD protein 8 


Hs.180414 


? 


AW249010 




\^\^ i o/\ 


chaperonin containing 
TCP1. subunit 6A (zeta 1 ) 


Hs.82916 


? 


L27706 


transport 


ANXA2 


annexin A2 


Hs.217493 


? 


BE293414 




PDZK1 


PDZ domain containing 1 


Hs.15456 


? 


AF012281 




SYPL 


synaptophvsln-like protein 


iHs.80919 


7 


S72481 




TIMM17A 


translocase of inner 
mitochondrial memlKane 17 
homofog A (yeast) 


Hs.20716 


? 


AW247564 




XP01 


exportin 1 (CRM1, yeast, 
homolog) 


Hs.79090 


? 


D89729 




HMGN4 


high mobility group 
nudeosomal biruitng domain 
4 


Hs.236774 


? 


U90549 




NUCB2 


nudeobindin 2 


Hs.3164 


? 


AW951523 




UGTREL 
1 


UDP-galactose transporter 
related 


Hs.154073 


? 


AW192554 






phosphoprotein enriched in 
astrocytes 16 


Hs.1 94673 


? 


Y13736 




Ul. 1 A 


dathrin. Hght pdypeptkle 
(Lea) 


Hs.104143 


? 


AW974204 




ATP6IP1 


ATPase, H+ transporting, 
lysosomal interacting protein 
1 


Hs.6551 


? 


NM 00118 

3 




SSR2 


signal sequence receptor, 
t>eta (transiocon-assodated 
protein beta) 


Hs.74564 


? 


BE313059 




AP3S1 


adaptor-related protein 
complex 3. Sigma 1 subunit 


Hs.80917 


? 


D63643 




VDAC2 


voltage-dependent anion 
channel 2 


Hs.78902 


? 


AI015604 




VPS45A 


vacuolar protein sorting 45A 
(yeast) 


Hs.6650 


? 


AA702845 






vaiosiir-conicnniny proiein 


Hs.106357 


? 


NM 00712 
6 




SACM2L 


SAC2 (suppressor of aqSn 
mutations 2, yeast, 
tiomologHike 


Hs.1 69407 


? 


AK001725 




KPNB1 


karyopherin Omporfin) b^eta 
1 


Hs.1 80446 


? 


L38951 




SLC21A3 


s(rfute carrier famfly 21 
(organic anion transporter), 
members 


Hs.46440 


? 


U21943 




SLC22A1 


solute carrier family 22 
(orgarvc cation transporter). 


Hs.1 17367 


? 


X98332 
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member 1 










HSPA5 


Heat shock 70kD protein 5 
(glucose-regulated protein. 
78kD) 


Hs. 75410 


? 




metabolism 


oNrVM 


glyceronephosphate O- 
acyltransferase 


Hs.12482 


? 


AF043937 




NME2 


non-metastatic ceils 2, 
protein (NM23B) expressed 
in 


Hs.275163 


? 


L16785 




NME1 


non-metastatic cells 1. 
prot^ (NM23A) expressed 

HI 


Hs.1 18638 


? 


AA147871 




U(aCRH 


ubiquinol-cytochrome c 
rediM^se hinge protein 


Hs.73818 


? 


AI093521 




TALD01 


transaldolase 1 


Hs.77290 


? 


AF010400 




P5CR2 


pyrroline 5-cartx>xyiate 
reductase isoform 


Hs^74287 


? 


AI161110 




GFPT1 


^utamine-fructose-6- 
phosphate transaminase 1 


Hs.1 674 


? 


NM 00205 
6 




DPMI 


dolichyl-phosphate 
mannosyitransferase 
polypeptide 1, catalytic 
subunit 


Hs.5085 


? 


AW1 73486 




ACLY 


ATP dtrate lyase 


Hs.1 74140 


? 


AW967351 




B4GALT3 


UDP-Gal:betaGlcNAc beta 
1.4- gaiactosyltransferase, 
polypeptide 3 


Hs:321231 


? 


Y12609 




GCN1L1 


GCN1 (general control of 
amirKxicid synthesis 1. 
yeastHike 1 


Hs.75354 


? 


D86973 




DPAGT1 


doltohyl-phosphate (UDP-N- 
acetyigiucosamine) N- 
acetylglucosaminephosphotr 
ansferase 1 (GlcNAc-1-P 
transferase) 


Hs.26433 


? 


Z82022 






acetyl-Coenzyme A 
acyltransferase 1 
(peroxisomal 3-oxoacyi- 
Coenzyme A thiolase) 


Hs.166160 


? 


NM 00160 
7 




/\LJL/rlo/\ 1 


akJehyde dehydrogenase 8 
family, memtjer A1 


Hs.18443 


? 


AI051566 






sterokl^5-alpha-redu(^ase, 
alpha polypeptide 2 ' 


Hs.1989 


? 


M74047 




NAT2 


N-ace^transferase 2 
(aryiamine N- 
acetyltrar^ferase) 


Hs.2 


7 


D90040 




GSTZ1 


glutathione transferase zeta 
1 (maleyiacetoacetate 
isoiherase) 


Hs^6403 


? 


U86529 




ADH1B 


ahx>h6l dehydrogenase IB 
(class 1), beta polypeptide 


Hs.4 


? 


M24317 




CYP2C8 


cytodirome P450, subfamily 
lie (mephenytoin 4- 
hydroxylase). polypeptide 8 


Hs.1 74220 


? 


M17398 
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CYP2E 


cytochrome P450, subfamily 
HE (ethanol-indudble) 


Hs.75183 


7 


J02843 


Unknown 




ESTs. Highly similar to 
H33_HUMAN HISTONE 
H3,3 fH.sapfensl 


Hs.349754 


? 


AA313375 




ECT2 


epithelial cell transforming 
sequence 2 oncogene 


Hs.122579 


? 


AL137710 




DKFZP56 
4B167 


DKFZP564B167 oroteln 


Hs.76285 


7 


AI032331 




KIAA0016 


translocase of outer 
mitochondrial membrane 20 
(yeast) homolog 


Hs.75187 


? 


013641 




DXS1357 
E 


accessory proteins 
BAP31/BAP29 


Hs.291904 


? 


Z31696 




KiAA0475 


KIAA0475 gene product 


Hs.5737 


? 


AA524523 




C20orf24 


chromosome 20 open 
reading frame 24 


Hs.184062 


? 


AI340141 




FLJ10326 


hypothetical protein 
FLJ10326 


Hs.262823 


? 


AA665998 




KIAA0117 


KIAA01 17 protein 


Hs.322478 


? 


AL1 33010 






Unknown 




? 


NM 00221 
1 




DEK 


DEK oncogene (DMA 
binding) 


Hs.1 10713 


? 


AI888504 




PODXL 


podocalyxin-like 


Hs.16426 


? 


BE395330 




DSS1 


Deleted in split-hand/split- 
foot 1 region 


Hs.333495 


. ? 


W79057 




PR01855 


hypothetical proton 
PR01855 


Hs.283558 


7 


AI379021 






Homo sapiens mRNA; 
cDNA DKF2p434l052 (from 
clone DKFZp434l052) 


Hs.378917 


7 


AA425759 




KIAA0470 


K1AA0470 aene oroduct 


Hs.25132 


7 


NM 01481 

2 




MYLE 


MYLE protein 


Hs.11902 


? 


AA628977 






Homo sapiens cDNA 
FU14232 fis. done 
h4T2RP4O00O35 


Hs.1018ld 


? 


AI675122 




MAGED2 


melanoma antigen, lamfly D, 

2 


Hs.4943 


? 


Z98046 




FLJ12806 


hypothetical protein 
FU12806 


Hs.1 07637 


? 


BE044582 


* 


YWHAB 


tyrosine 3- 

monooxygenase/tryptophan 
5-monooxygenase 
activation protein, beta 
polypeptide 


Hs^79920 


? 


AL008725 






Unknown 




7 


AL03177O 




LOC5123 
5 


hypothetical protein 


Hs.181444 


? 


All 90653 




KIAA0592 


KIAA0592 protein 


Hs.1 3273 


? 


AL080183 




KIAA0205 


KIAA0205 gene product 


Hs.3610 


? 


D86960 




C1orf9 


chromosome 1 open 
reading frame 9 


Hs.108636 


? 


BE466870 






IQAA07i38 0rotein 


Hs.246112 


? 


AB018331 


1 MQCIOSS 1 hvDothefteal protein 


Hs.334787 


? 


BE379431 



41 



wo 2004/108964 



PCT/SG2004/000166 





6 


MGC19556 










I/I A A A^O'i 

KIAA0731 


KIAA0731 protein 


Hs,6214 


7 


AB018274 




C7orf14 


chromosome 7 open 
reading frame 14 


Hs.84790 




0^978 




D123 


D123 qene product 


Hs,82043 


7 


U27112 




CSorfS 


chromosome 5 open 
reading frame 8 


Hs.75864 


7 


BE254013 






Homo sapiens cDNA: 
FU23020 fis, done 
LNG00943 


Hs.6127 


7 


AA054768 




AD24 


A024 protein 


Hs.74899 


7 


AI017605 




WHIP 


Werner helicase interacting 
protein 


Hs.236828 


? 


AA4d1600 




BC-2 


putative breast 
adenocarcinoma marker 
(32I<D) 


Hs.12107 


? 


AF042384 




DKFZP54 
7E101 


DKFZP547E1010 protein 


Hs.323817 


7 


NM 01560 
7 




FLJ22251 


hypothetica! protein 
FLJ22251 


Hs^89064 


7 


AA595663 






ESTs 


Hs.89267 


7 


AA284067 




KIAA0187 


KIAA0187 gene product 


Hs.10848 


7 


Dd0009 




MPV17 


MpV17 fransgene, murine 
homolog. glomerulosderosis 


Hs.75659 


7 


NM 00243 
7 




MAWBP 


MAWD binding protein 


Hs.16341 


7 


Al^6254 






Homo sapiens cDNA 
FLJ37464 fis, done 
BRAWH2011795, wealdy 
similar to LIVER 
CARBOXYLESTERASE 
PRECURSOR (EC 3.1 .1 .1 ) 


Hs.346947 


? 


N44535 






Homo sapiens SNC73 
protein (SNC73) mRlsiA, 
complete cds 


Hs.293441 


7 


AA290845 






CO 1 s, nigniy similar lo 
Siy^HUIB melallothionein IB 
[H«sapiens1 


riS.oolU^ 




R99207 






Homo sapiens unlcnow^ 
mRNA 


Hs.367982 


? 


H72532 




FLJ12666 


hypothetical protein 
FLJ12666 


Hs.23767 


? 


AW952494 




RNAHP 


RNA helicase-related 
protein 


Hs.8765 


? 


AI814448 



*gene expression level showing at least 1.54old change in HCC tumoris relative to non-tumor 
liver tissues (P<1x1 0^) 



Real-time RT-PCR analysis was performed on a panel of genes, including 
IGFBP3 and ERBB3 in all the 37 matched HOC tumor and non-tumor liver samples. 
Expression of a known ^lionsekeeping^ gene porphobilinogen deaminase (PBGD) 
(Fink et al, 1998) was used as normalizing control. Tlie results of real-time RT-PCR 
analyses of IGFBP3 and ERBB3 indicated that IGFBP3 expression was diminished in 
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35 of 37 HCC tumors relative to their corresponding non-tumor liver tissues (Figure 
5A), while ERBB3 expression was elevated in 34 of 37 tumor samples (Figures 5B)- 
Since ERBB3 is defective in tyrosine kinase activity and requires dimerization with 
oth^ receptors, possibly anodier mCTiber of the ERBB family (Riese and Stem, 
5 1 998), the hypothesis that HCC tumors expressing higji levels of ERBB3 were 

associated with hi^ expression of ERRB2 or EGFR was tested. The expression of 
ERBB2 was elevated in 12 of 37 tumors (Figure 5C), while hi^ EGFR expression 
was found in IS of 37 tumors (Figure 5D). A significant concomitant increase in 
ERBB2 expression (t-test P-value -0,0026), but no assodation with high EGFR 
10 expression (t-test P-value -0.3 1) was foxmd in the top fifty percentile of hi£^ ERBB3'- 
expressing HCC tumors, indicating that the cognate partners of ERRB3 appeared to 
be present in those tumors expressing high levels of ERBB3. Real-time and semi- 
quantitative RT-PCR analyses were also conducted on a panel of genes identified as 
differentially expressed in HCC (Figures 6-21). 

15 

Validation Of HCC Tumo r niscriminator Exp ression Cassettes 

Changes in gene ^qiressioii of HCC using microarray technology have be^ 
r^rted (Cbm et al, 2002, Okabe et al, 2001; Honda et al, 2001; Shirota et al, 2001; 
Tadcels-Home al, 2001; Xu et al, 2001a; Xu et al, 2001b). The intersection of the 
20 important gene lists, derived as desoibed herdn, with gene lists reported previously 
was explored, and resulted in the identification of additional gene lists or '^expression 
cassettes" (Tables 2-4) that were capable of distinguishing HCC tumor firom non- 
tumor liw&L tissues. 

In the first g&xc list, a total of 265 features, containing 245 unique UniGenes 
25 firom the miat>array used herein wctc observed to overlap (Table 2). Hi^mchical 
clustering analyses based on ejqpressipn levels of fhese 265 ^dv^lap' features 
separated the tissue set into two distinct groups of tumor and non-tumor, with five 
tissue samples misclassified. Such clustering was significant (Pa<lxIO'^ based on 
random permutation testing of sample labels. Tlie likelihood of a randomly chosen set 
30 of 265 features produdng five or fewer samples nusclassified was low CPb=l -5x1 0'^). 
Thus, these 265 ^overls^' features could distinguish HCC tumor fix>m non-tucnor hvec 
with reasonable precision, and the features were unlikely to appear by chance. Among 
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these genes were smaller subgroups characterized by distinct gene expression 
signatures involving potratial different pathways. A cholesterol biosynthetic pathway 
was characterized by hi^er expression in HCC tumors for genes.of the enzymes 
SQLE, ACLY and FDPS. A subgroup involved in growth and differentiation was 
5 characterized in the HCC tumor tissues by lower expression of ESRl, IGFBP3 and 
PDGFRo, and hi^ expression of PPTBL A subgroup of bZIP transcription factors 
ATF3, FOS, JUN, and MYBL2 was characterized to be down-regulated in the HCC 
tumor tissues. 

10 Table 2. Intersection of mlcroarray expression datasetviritfa HCC Genes 



llniGene 

Identifier 


Gene 


Description 


GenBank No. 


Hs.101408 


BCAT2 


branched chain aminotransferase 2, 
mitochonchrlal 


BE264265. 
AA436410 


Hs.101408 


. BCAT2 


branched chain aminotransferase 2, 
mitochondrial 


NM 001190. 
AA436410 


Hs.102664 


VAMP4 


veside-assodated membrane protein 4 


AL035296, 
AA424813 


Hs.10319 


UGT2B7 


UDP gl yccsyltransf erase 2 femlly. 
polypeptide B7 


J05428. 
AA746229 


Hs.10359 




ESTs 


AW316760. 
AA630881 


Hs.106061 


RDBP 


RD RIMA-tMnding protein 


X16105. 
AA056390 


Hs.1 07253 


DKFZP761F241 


hypothetical protein DKFZp761 F241 
homo sapiens cDNA: FI-J20925 fts. done 
ADSE00963 


AW519080. 
R20416 


Hs.108441 


HAAO 


3-hydro}^nthraniiate 3.4-dioxygenase 


NM 012205. 
T80846 


Hs.108636 


C1orf9 


clwomosome 1 open reading frame 9 
CHI MEMBRANE PROTENIN CHI 


BE466870. 
N36176 


Hs.110613 


SMG1 


PI-3-ldnase-related kinase SMG-1 
KIAA0220 KIAA(I220 proton 


AB007881. 

R97225 


Hs.11314 


DKFZP564N13 
6 


DKFZP564N1363 protein 


AI360105. 
T87343 


Hs.1 15617 


CRHBP 


corticotropin releasing homrione-binding 
protein 


NM 001882. 
AA286752 


Hs.1 18087 


KIAA0610 


KiAA0610 protein 


AB011182. 
N38860 


Hs.1 18638 


NME1 


non-metastatic cells 1. protein (NM23A) 
expressed in 


AA147871. 
AA644092 


Hs.1 18666 


PP591 


hypothetical protein PP591 

human done 23759 mRNA. partial cds 


U79241. 
AA626336 


Hs.1 19651 


GPC3 


giypican3 


U50410, 
AA775872 


Hs.12451 


EMAPL 


ediinpderm microtubule-assodated 
protein-like 


NM 004434, 
AA447196 


Hs.1 2482 


GNPAT 


alvceronephosphate O-acyltransferase 


AF043937. 
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AA486845 


Hs.125180 


GHR 


growth hormone receptor 


X06562. N70358 


Hs.125359 


THY1 


Thy-1 cell surface antigen 


N94350. 

AI346653 

AA428836 


Hs.1265 


BCKDHB 


branched chain keto acid dehydrogenase 
E1 , beta polypeptide (maple syrup urine 
disease) 


NM_P00066. 

i\rV*^ f i 0\3 


Hs.1279 


C1R 


complement component 1 , r 
subcomponent 


Ml 4058, 
AA041382 
AF045649. 
T69603 


r1S.l0999 


I\I/aMU/UU 


i\i/\r\\/ # uvf proceiii 


AI018400. 
N55167 


lis.lAou 


rii 


coagulation factor XI (plasma 
thromboplastin antecedent) 


AF045649. 
R88d90 


Hs.144oo 




interferon consensus sequence binding 
protein 1 


AW964220. 
N62269 


rlS.1445o 




interferon consensus sequence binding 
protein 1 


AA514545. 
N62269 


Hs. 144904 




nuclear recepcor co-repressor \ 


AB028970. 
AA085748 


Hs.144904 


NCOR1 


nudear receptor co-repressor 1 


AA468619. 
AA085748 


Hs.145567 


AF038169 


h^>othetical protein 


AI694342. 
AA406301 


Hs,1 45567 


AF038169 


hypKJthetical protein 


AI963556. 
AA406301 


Hs.146360 


IFITM1 


interferon induced transmembrane protein 
1 (9-27) 


AA428847, 
AA419251 


Hs;14838 


FLJ10773 


likely ortholog of mouse NPC derived 


AA044181, 

R93542. 

AA401264 


He 1<^A7 




chromosome 1 open reading frame 16 
KIAA0250 KIAA0250 qene product 


D87437. 
AA431423 


ilo. 1 x9 1 ^ 1 0 


TARRP1 


TAR /Hi\A RNA-btndina orotein 1 


NM 005646. 
N62244 


Hs.15154 


SRPX 


sushi-repeatHx^ntaining protein, X 
chromosome 


NM 006307. 
AA448569 


Hs.1531 


EHHADH 


enoyl-Coen:^e A, hydratase/3- 
hvdroxyacyl Ck>enzyme A dehydroqenase 


L07077. R02373 


Hs.1531 


EHHADH 


enoyl-Coenzyme A, hydratase/3- 
hvdroxyacyl Coenzyme A dehydroqenase 


AI800553, 
R02373 


Hs.1 53357 


PLOD3 


procollagen-lysine, 2-oxoglutarate 5- 
dioxyqenase 3 


AF046889, 
AA459305 


Hs.1 54890 


FACL2 


fattv-add-Coenzyme A liqase. ionq-diain 2 


D10040. T73556 


Hs.1 55079 


PPP2R5A 


protein phosphatase 2, regulatory subunit 
B (B56), aloha isoform 


AA234460, 
R59164 


Hs.155560 


CANX 


calnexin 


AA203197. 
AA1 26265 


HS.1S5637 


PRKDC 


protein kinase, DNA-activated, catalytic 
polypeptide 


U34994. R27615 


HS.1S5956 


NAT1 


N-acetyftransferase 1 (arylamine N- 
acetyitransferase) 


R79401.T67128 


Hs.157148 


MGC13204 


hypotheth:al protein MGC13204 

Homo sapiens cDNA FU1 1883 fis. done 


BE262748. 
N62451 
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rltMcJAllJU/ 1 fO 




Hs.1578 


BIRC5 


baculoviral lAP repeat-containing 5 
(survivtn) 


AW247335, 
AA460685 


Hs.1 59301 


IL18R1 


fnteiieukin 18 receptor 1 


U43672, 
AA482489 


Hs.160318 


FXYD1 


FXYD domain-containing ion transport 
regulator 1 (phospholemman) 


All 25364, 
H57136 


Hs.1 60786 


ASS 


argininosuccinate synthetase 


BE393272, 
AA676466 


Hs.16341 


MAWBP 


MAWD binding protein 

ESTs, Weakly similar to predk:ted using 

Genefinder \C. elegans] 


AI866254. 
R54416 


Hs.1 6426 


PODXL 


podocalyxin-like 


BE395330, 
N64508 


Hs.1 66891 


RFX5 


regulatory factor X, 5 (influences HLA class 
H expression) 


AL050135, 
AA418045 


Hs,167382 


NPR1 


natriureflc pepfkle receptor A/guanylate 
cyclase A (atrionatriuretic peptide receptor 
A) 


AA598841 


Hs.167529 


CYP2C9 


cytochrome P450, subfamily IIC 
(mephenytoin 4-hydroxylase), polypeptide 
9 


M61857, R89491 


Hs.169517 


ALDH1B1 


aldehyde dehydrogenase 1 fomHy, member 
B1 


M63967. R93550 


Hs.169756 


CIS 


comfdement component 1, s 
subcomponent 


NM 001734, 

AA055520. 

To204o 


Hs.169907 


GSTA4 


giutathkMie S-transferase A4 


AA1 52346 


Hs.1 70001 


EIF2B2 


eukaryotk^ translation inlttatton factor 2B, 
subunit 2 (beta, 39kD) 


AA678061, I 
R86304 


Hs.1 70001 


EIF2B2 


eukaryotic translation initiation factor 2B, 
subunit 2 (beta. 39kD) 


A CT/IO COO i\ 

ArUoozoO, 
R86304 


Hs.1 701 33 


F0X01A 


forkhead boxt^l A (rtiabdomyosarcoma) 


AA448277 


Hs.171955 


TROAP 


trophintn associated protein (tastin) 


U04810. H94949 


Hs.172665 


MTHFD1 


methyienetetrahydrofolate dehydrogenase 
(NADP+ dependent), 
methenyitetrahydrofolate cydohydrolase, 
formyltetrahvdrofofate synthetase 


NM_005956. 
rtiO/ /o 


Hs.1 7371 7 


PPAP2B 


phosphatidic acid phosphatase type 28 


AI458142. 

1 /Ivf D 


hte.1 73880 


IL1RAP 


interleukin 1 receptor accessory protein 


AB006537, 

Ko59U2, 

AA256132 


Hs.1 741 40 


ACLY 


ATP citrate lyase 


AW967351. 
HUo047 


HS.174220 


CYP2C8 


cyiocnrome r^HOU, ouuiamiiy no 
(mephenytoin 4-hydroxylase), polypeptide 
8 


Ml 7398. N53136 


Hs.1 77592 


RPLP1 


rit>osomal protein, large, PI 


AW963733, 
AI732304 


Hs.17767 


KiAA1554 


KIAA1554 protein 


AI625594, 

AA857573, 

H17860 


Hs.179718 


MYBL2 


v-myb avian myeloblastosis viral oncogene 


X13293. 
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homolocHike 2 


AA456878 


Hs.180383 


DUSP6 


dual speaficity phosphatase 6 


AB013382. 
AA630374 


Hs.180919 


iD2 


inhibitor of DNA binding 2, dominant 
negative helix-loop-helix protein 


AI950041. 
H82442 


Hs.181345 


SAH 


SA (rat hypertension-associated) homolog 


AI632754, 
N73827 


Hs.182018 


IRAKI 


interleukin-1 receptor-associated kinase 1 


NM 001569. 

AI202323, 

AA683550 


Hs.18212 


0XS9879E 


DNA segment on chromosome X (unique) 
9879 expressed sequence 


W73156, 
AA479062 


Hs.1 82575 


SLC15A2 


solute carrier family 1 5 (H+/peptide 
transporter)* member 2 


S78203, 
AA425352 


Hs.1827 


NGFR 


nerve growth factor receptor (TNFR 
superfamllv. memt>er 16) 


NM 002507, 
R55303 


Hs.183858 


TIF1 


transcriptional intermediary factor 1 


AF1 19042, 

R38345. 

AA016972 


Hs.18443 


ALDH8A1 


akiehyde dehydrogenase 8 family, member 
A1 

ESTs 


AI051566 
N70701 


Hs.1 84697 




Homo sapiens clone 23785 mRNA 
sequence 


AF035307, 
AA041362, 
AA663440 


Hs.18676 


SPRY2 


sprouty (Orosophila) homolog 2 


NM_005842, 
AA453769 


Hs.1 94660 


CLN3 


ceroid-lipofuscinosis, neuronal 3, juvenile 
(Batten, Spielmeyer-Vogt disease) 


AW249073, 
W37752 


Hs.1 94673 


PEA15 


phosphoprotein enriched in astrocytes 15 


Y13736, 
AA293211 


Hs.1 9554 


C1orf2 


chromosome 1 open reading frame 2 


NM 006589. 
H1i4(B4 


Hs.198282 


PLSCR1 


phosphoiipkl scramblase 1 


AB006746, 
N25945 


HS.1 8904 


CTH 


cystathtonase (cystathionine qamma-lyase) 


S52784. R07167 


Hs.20144 


SCYA14 


small inducible cytokine subfamily A (Cys- 
Cys). member 14 


NM_004166. 
R96626 


Hs.2030 


THBD 


thrombombdulin 


NM_000361. 
H59861 • 


Hs.20315 


IFIT1 


interferon-induced proton with 
tetratricopeptide repeats 1 


NM_001548. 
AA1 57787 


Hs.2128 


DUSP5 


dual specificity phosphatase 5 


NM_004419, 
W65460 . 


Hs.213289 


LDLR 


low density lipoprotein receptor (famUial 
hypercholesterolemia) 


NM_q00527, 
AA504461 


Hs.21413 


SLC12A5 


solute carrier family 1 2, 


U79245, 

MrVlOOOOO 


Hs.21635 


TUBG1 


tubulin, gamma 1 


NM 001070, 
I///32 


Hs;2178 


H2BFQ 


H2B histone family, member Q 


BE245642. 
AAbl0223 


Hs.227656 


XPR1 


xenotropic and polytropic retrovnxis 
receptor 


AL1 37583, • 
AA453474 


Hs.23642 


HSU79266 


protein predicted by clone 23627 


U79266.W95346 


Hs.237356 


SDF1 


stromal cell-derived factor 1 


AA442810. 
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AA44 / 1 1 0 


Hs.237356 


SDF1 


stromal cell-derived factor 1 


L36033, 


Hs.23767 


FLJ12^6 


hypothetical protein FLJ12666 

Homo sapiens cONA FIJI 266 fis. done 

NT2RM4002256 


AW952494. 
H10192. 
AA1 15300. 
AA131466 


Hs.239 


FOXMI 


forkhead box Ml 


Uo31i3. 
AA1 29552 


Hs.239069 


FHL1 


four and a half LIM domains 1 


AA725097, 
AA455925 


Hs^39758 


FU12389 


hypothetical protein FLJ12389 similar to 
acetoacetyl-CoA synthetase 
Homo sapiens cDNA FLJ12389 fis, done 
MAM MAI 002o71 , weakly similar to Acetyl- 
coenzyme A synthase (EC 6^.1 .1 ) 


AI697801. 
R48270 


Hs.241561 


PRSS2 


protease, serine, 2 (trypsin 2) 


U06O0I, 
AA284528 


Hs.2430 


TCFL1 


transcription factor-like 1 


A A ~7£\ COO 

AA705337, 
AA4439S0 


Hs.24950 


RGS5 


regulator of G-protein signalling 5 


AI674877. 

N34362, 

AA668470 


Hs^52587 


PTTG1 


pituitary tumor-transforming 1 


AA20347D, 
AA430032 


HsJ!5313 


MCRS1 


mrcrospherule protein 1 


APO60OO7, 

AA488757 


Hs^5475 


AQP7 


aquaporin 7 


AW7/y/Ol, 

H27752, 


Hs^56583 


ILF3 


interieukin enhancer binding factor 3, 90kD 


AF007140, 


Hs.256583 


ILF3 


interleukin enhancer binding factor 3, 90kD 


NM_J)12218. 
AA44Q04fi 


Hs.25797 


SF3B4 


splicing factor 3b, subunit 4, 49kD 


NM_005850. 


Hs.262958 


DKFZP434B04 
4 


hypothetical protein DKFZp434B044 

tlO 1 S 


AA541776, 


Hs.26403 


GSTZ1 


glutathione transferase zeta 1 
(maieyiacetoacetate isomerase) 


U86529, 
AA428334 


Hs.264330 


ASAHL 


N-^cylsphingosine amidohydrolase (add 
ceramidaseHike 


nPoft7nn7 

Otl^O#UU/« 

W47576 


Hs^7289 


POLA 


polymerase (DNA directed), alpha 


AA7n7fi'50 


Hs^699 


GPC1 


glypican 1 


NM_002081, 


Hs^70256 




Homo sapiens done IMAGEil 963178. 

mRNA seauence 

ESTs 


AI355014. 
R10140 


HS270845 


KNSL5 


kinesin^ike 5 (mitotic kinesin-like protein 1) 


H63163. 
AA452513 


Hsi279607 


CAST 


calpastatin 


U38525. M78523 


Hs.284142 


C21orf4 


chromosome 21 open reading frame 4 


BE256S59. 
W69668 


Hs^142 


C21orf4 


chromosome 21 open reading frame 4 


BE142872. 
W69668 
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Hs.28465 




Homo sapiens cDNA: FIJ21869 fis. done 
HEP02442 


AW582012. 
R63929 


Hs.^8650 


AQP4 


aquaporin 4 


NM 001650. 
H09087 


Hs 291904 


DXS1357E 


accessory proteins BAP31/BAP29 


Z31696. 
AA625628 


Hs.2934 


RRM1 


ribonucleotide reductase Ml polypeptide 


X59543. 
AA633549 


Hs.293970 


ALDH6A1 


methyimalonate-sem iaidehyde 
dehydrogenase 


C00821. H63534. 

AA196160, 

H63534 


Hs.293970 


ALDH6A1 


methyfmalonate-semialdehyde 
dehydrogenase 


M93405, N62179, 

AA196160. 

H63534 


Hs 294151 


KIAA1917 


KIAA1 91 7 protein 


BE222511. 
AA452113 


Hs 295923 


SiAHl 


seven in absentia (Drosophila) homolog 1 


AA935716, 
T71889 


Hs.296049 


MFAP4 


mlcrofibrillar-associated protein 4 


L38486, 
AA442695 


Hs.2g6259 


PONS 


paraoxonase 3 


L48516. R95740. 
T57069 


Hs.296341 


CAP2 


adenyfyl cyclase-assoclated protein 2 


AW779995. 
AA040613 


Hs.296341 


CAP2 


adenylyl cyclase-assodated protein 2 


U02390, 
AA040613 


Hs.30151 




ESTs. Weakly similar to JC6238 
gaiactosylceramide-like protein, GCP 
fH.saplensl 


AA926994, 
N73670 


Hs.30340 


KIAA1165 


hypothetical protein KIAA11 65 


AB032991, 
AA449330 


Hs.30340 


KIAA1165 


hypothetical protein KIAA1 165 


AA770150. 
AA449330 


Hs.3416 


ADFP 


adipose differentiation-related protein 


NM_001122, 

AA700054, 

AA142916 


Hs.35120 


RFC4 


replication factor C (activator 1 ) 4 (37kD) 


AA600213, 
N93924 


Hs.3530 


FUSIP2 


FUS-interacting protein (serine-arglnine 
rjch)2 

TLS-assoGsated serine-airginine protein 2 


AK001656. 
H1ld42 


Hs.36102 




ESTs. Highly similar to SMHU1B 
metallothionein 1 B fH.sapiensI 


R99207. H72722 


Hs.37009 


ALPI 


alkaline phosphatase, intestinal 


NM_001631. 
AA1 90871 


Hs.38163 




Homo sapiens. Similar to hypothetk^l 
protein, MGC:7035, clone MGC:20737 
iivi/\v5C.*fOOOOOO, mrxiN/A, \Aiiii|iic;ic \«U9 
ESTs 


AW074863, 
H63116 


Hs.3873 


PPT1 


palmitoyi-protetn thioesterase 1 (cerold- 
li^fusdnosis, neuronal 1, infantile) 


AL037943. 
AA03425O 


Hs.388 


NUDT1 


nudfoc (nucleoside diphosphate linked, 
moiety XVtype motif 1 


AI656937. 
M443998 


Hs.4 


ADH1B 


aicohd dehydrogenase IB (class 1), beta 
polypeptkJe 


M24317. N93428 


Hs,41726 


SERPINB8 


serine (or cysteine) orotelnase inhibitor. 


NM 002640. 
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dade B (ovalbumin), menriber 8 


W60100 


Hs.4187 


LOC55977 


hypothetical protein 24636 


AI066576, 
N62562 


Hs.42650 


ZWINT 


ZW10 interactor 


AW409765, 
AA706968 


Hs.44532 


UBD 


diubtquitin 


NM 006398, 
N33920 


Hs.460 


ATF3 


activating transcription factor 3 


N39944, H21041 


Hs.4742 


GPAA1 


anchor attachment protein 1 (Gaalp, 
yeast) homofog 


NM 003801. 
AA465301 


Hs.4756 


FEN1 


flap structure-specific endonuclease 1 


BE278623. 
AA620553 


Hs.4788 


NCSTN 
KIAA0253 


Nicastn'n 
nicastrin 


D67442. R96527 


Hs.4788 


NCSTN 
KiAA0253 


Ntcastrin 
nicastrin 


BE1 79772, 
R96527 


Hs.4d34d 


HH114 


hypothetical protein HH114 

Homo sapiens done HH1 14 unknown 

mRNA 


AA428370, 
AA130117 


Hs4854 


CDKN2C 


cydin-dependent kinase inhibitor 2C (pi 8, 
inhibits CDK4) 


AF041248, 
N72115 


Hs 49265 




ESTs 


AI141174, 
AI140241 


Hs 49912 


PXMP2 


oeroxisbmal membrane orotein 2 /22kD) 


BE393339, 
N70714 


Hs.50758 


SMC4L1 


SMC4 (structural maintenance of 
chromosomes 4, yeast)-like 1 
CAP-C chromosome-associated 
polypeptide C 


AB019987, 
AA452095 


Hs.50966 


CPS1 


carbamoyl-ptiosphate synthetase 1 , 
mitochondrial 


Y15793, N68399 


Hs.50966 


CPS1 


carbamoyl-phosphate synthetase 1 , 
mitochondrial 


AA1 13231, 
N68399 


Hs.5333 


KIAA0711 


KIAA071 1 gene product 


NMJ)14867, 
AA702544 


Hs.5719 


CNAP1 


dvomosome condensatiorwelated SMC- 
assodated protein 1 


D63880, 
AA668256 


Hs.5719 


CNAPl 


chromosome condensation-related SMC- 
assodated protein 1 


NM 014865, 
AA668256 


Hs.572 


ORM1 


orosomucold 1 


X02544, 
AA700876 


Hs.574 - 


FBP1 


friictose-I.e-bisphosphatase 1 


Ml 9922, 
AA699427 


Hs.5897 




Homo sapiens mRNA; cDNA 
DKFZp586P1622 (from done 
DKFZp586P1 622) 


AI383214, 
T59658 


Hs.61638 


MYO10 


myosin X 


AI198676, 


Hs.6551 


ATP6S1 


AtPase. H+ transporting, lysosomal 
(vacuolar proton pump), sub^nit 1 


NM 001183. 
AA487588 


Hs.6566 


TRIP13 


thyroKI hormone receptor interactor 1 3 


BE090548. 
AA630784 


Hs.66 


IL1RL1 


interieukin 1 receptor-like 1 


AB012701. 
AA125917 


Hs.6838 


ARHE 


las homolog gene famfly^ member E 
ESTs 


W03441, 
7\A443302 
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oCfUalcllt? t?|JvlAlUao6 


AF098865. 
R01118 


Hs.71622 


SMARCD3 


SWI/SNF related, matrix associated, actio 
dependent regulator of chromatin, 
subfamily d. member 3 
Weakly similar to KIAA0319 fH. sapiens! 


U66619. 
AA136103 


Hs.737 


ETR101 


II 1 II 1 ICUICIl^ Ijf 1^1 V/iC7ll 1 


AA1 94084. 

AA496359 


Hs.738 


RPL14 


ribosomal protein LI 4 
early growth response 1 


BE410686. 
AA486533 








NI^ 003993. 
AA282845 


Hs 740 


PTK9 




NM 005607. 
AA291486 


H<i 741 20 


APMP 

A\i IVIiC 




AI093004. 

W94684 


He 7.4.1711 


IVI lie. 


meioiiouiionoin i c ^Tuncnonai ) 


H72532. 
AA872383 


Mc 74<^R1 
ns« # *IOD 1 




aipnasc-fnacruyiouUiin 


NM 000014. 
AA775447 


Hs.74566 


DPYSL3 


dihydropynmfdinase-IIke 3 


D78014, 
AI831083 


Hs.74579 


KIAA0263 


KIAA0263 gene product 


D87452. 
AA634464 


rfS.f *H3 lO 




platelet-derived growth factor receptor, 
alpha polypeptide 


M21574. rl23235 


Hs.74615 


PDGFRA 


platelet-derived growth factor receptor, 
alpha polypeptide 


AW887370, 
H23236 


Hs.74711 


DNAJC8 


DnaJ (Hsp40) homotog, subfamily C, 
member 8 

Splicing factor similar to dnaJ 


/VA513o69, 
T60163 


Hs.748 


FGFR1 


fibroblast growth factor receptor 1 (fms- 
related tyrosine kinase 2, Pfeiffer 
syndrome) 


X66945, R54610 


Hs.75103 


YWHAZ 


tyrosine 3-monooxygenase/tryptophan 5- 
monooxygenase activation protein, zeta 
poiypepiKie 


AA911031. 
AA609598. 
H94670, 
AA485749 


ns./oiuo 




tyrosine 3-monooxygenase/tryptophan 6- 
monooxygenase acavation prorein, zeta 
polypeptide 


BE315169 
AA609598. 
H94670. 
AA485749 


Hs.75106 


CLU 


ciusterin (coinfdenrient l^is inNbitor, SP- 
40,40. sulfated glycoprotein 2, 
testosterone-repressed prostate message 
2. apoiipoprotetn J) 


AA292226, 

MrVfO**- too 


Hs.75117 


ILF2 


interleukin enhancer binding factor 2. 45I<D 


AA307289. 
AA894687, 
H95638 


Hs.75117 


ILF2 


interieukin enhancer binding factor 2. 45kD 


AA601029. 
AA894687. 
H%638 


Hs.75196 


BAT8 


HLA-B associated transcript 8 

G9A ankyrin repeat-containing protein 


NM 006709. 
AA434117 


Hs.75216 


PTPRF 


[>rotein tyrosine phosphatase, receptor 
type. F 


F08552. 
AAS98513 
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Hs.76318 


TUBA1 


tubulin, alpha 1 (testis specific) 


X06966, R36063, 
AA1 80742 


Hs.75361 


PK1.3 


gene from NF2/meningioma region of 
22q12 


AB023200. 
/\A70004o 


Hs.75438 


QDPR 


quinoid dihydropteridine reductase 


AA159812, 
R381 98 


Hs.75545 


IL4R 


interleukin 4 receptor 


X52425. 
AA292025 


Hs.75572 


CPB2 


carboxypeptkJase B2 (plasma, 
carboxypeptidase U) 


NM_001872. 
ri47837 


Hs.76618 


RAB11A 


RAB1 1 A. member RAS oncogene family 


BE1 22870, 
AA025058 


Hs.76658 


PYGB 


phosphoryiase, glycogen; brain 


U47025. 
AA922705 


Hs.75678 


FOSB 


FBJ murine osteosarcoma viral oncogene 
homoloq B 


L49169. T61948 


Hs,75812 


PCK2 


phosphoenoipyruvate cart>oxykinase 2 
imitochondriai) 


X92720, 
AA1 86901 


Hs.76252 


EDNRA 


endothelin receptor type A 


D90348. 
AA450009 


Hs.76325 


SLU 


ESTs, Highly similar to IGJ HUMAN 
IMMUNOGLOBULIN J CHAIN IH.sapiensJ 
step 11 splianq factor SLU7 


AW1 72754, 
T70057 


Hs.7645 


FGB 


fibrinogen, B t>eta F>olypeptide 


AW589878. 
H91 1 21 , T73858 


Hs.76461 


RBP4 


retinol-binding protein 4, plasma 


AF074657, 
T72076 


Hs,7647 


MAZ 


MYC-assodated zinc finger protein 
(purine-bindinq transcription factor) 


BE264373, 
AA704613 


Hs,77266 


EZH2 


enhancer of zeste (Drosophila) homolog 2 


U52965. 
AA428252 


Hs.77326 


IGFBP3 


insulin-like growth factor binding protein 3 


BE336944. 
AA598601 


Hs.77393 


FDPS 


famesyl diphosphate synthase (famesyl 
pyrophosphate synthetase, 
dimethylallyitranstransferase, 
geranyltranstransferase) 


D14697, T65790 


HS-77S97 


PLK 


polo (Drosophta)-i3^e kinase 


X75932. 


Hs.77667 


LY6E 


lymphocyte antigen 6 complex, locus E 


NM 002346. 
AA865464 


Hs.77854 


RON 


regucalcin (senescence marker protein-30) 


AB032064, 
H05140 


Hs.78045 


ACTG2 
TFPI2 


actin, gamma 2, smooth muscle, enteric 
tissue pathway inhibitor 2 


NM_001615. 
AA293402 


Hs.78465 


JUN 


v-jun avian sarcoma virus 17 oncogene 
homolog 


AI885769. 

W96134 


Hs.78524 


HTCD37 


TcD37 homolog 


/\I^OO*rO*f', 

AA022472. 
AA456635 


Hs.78865 


TAF6 


TAF6 RNA polymerase II, TATA box 
binding protein (TBPV'associated factor. 80 
kD 


NM 005641, 
R19071 


Hs.789 


GRdi 


GR01 oncogene (melanoma growth 
stimulatinq activity, alpha) 


NM 001511. 
W469d0 


Hs.78996 


PCNA 


proiiferatinq cell nuclear antiqen 


Ai624204. 
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AA450264 


Hs.79078 


MAD2L1 


MAD2 (mitotic arrest deficient, yeast, 
homoloqV4ike 1 


NM_002358, 
AA481076 


Hs.79081 


pppicx; 


protein phosphatase 1 , catalytic subunit. 
qamma isofonn 


NM_002710, 
AA1 29930 


Hs 79088 


RCN2 


reticulocalbin 2, EF-hand calcium binding 
domain 


AL1 20373. 
AA598676 


Hs.79334 


NFIL3 


nuclear factor, interleukin 3 regulated 


X64318, 
AA63381 1 


Hs.79404 


D4S234E 


neuron-specific protein 


AA975473, 
AA875888 


Hs.80248 


RBPMS 


RNA-binding protein gene with muFtlpie 
splidng 


D84107, T98807 


Hs.80658 


UCP2 


uncoupling protein 2 (mitochondrial, proton 
carrier) 


AW192446. 
H61242 


Hs 81170 


PIM1 


Diin-I oncoaene 


M54916. 
AA447730 


Hs.8136 


EPAS1 


endothelial PAS domain protein 1 
Homo sapiens clone 23698mRNA 

isequence 


U51626. R24882 


Hs 81687 


NME3 


non-metastatic cells 3, protein expressed 
in 


U29656, 
AA398218 


Hs 81848 


RAD21 


RAD21 (S. pombe) homolog 


NM_006265, 
AA683102 


Hs.81892 


KIAA0101 


KIAAOIdl flene product 


D14657.We8219 


Hs.82042 


SLC23A1 


solute carrier family 23 (nudeobase 
transporters), member 1 


087075. N23766 


Hs.821 


BGN 


Big^ycan 

Zinc finger protein homologous to 2fp92 In 
mouse 


NM 001711. 
R77226, N51018 


Hs 82112 


IL1R1 


Interieukin 1 receptor, type 1 


M27492, R56687, 
AA464525 


Hs 82273 


FLJ20152 


hypothetical protein 


AI536745. 
AA446864 


Hs.82503 




Homo sapiens cDNA FLJ30550 fis, clone 
BRAWH2001502 

Homo sapiens mRNA for 3 UTR of 
unknown protein 


Y09836, 
AA670382 


Hs.8265 


TGM2 


transglutaminase 2 (C polypeptide, protein- 
alutamihe-qamma-qlutamyltransferase) 


M9o479, 
AA1 56324 


Hs.82794 


CETN2 


centrin, EF4iand protein, 2 


NMJEMl4o44, 


Hs.82906 


CDC20 


CDC20 (cell division cycle 20. S. 
cerevisiae, homolog) 


BE293657 


Hs.8294 


KIAA019D 


KIAA0196 gbne product 


INIVI VIHOHO 


Hs.82962 ' 


TYMo 


thymklylate synthetase 


IMM UUlU/1 


HS.83164 


COL15A1 


collagen, type XV, alpha i 


LU loHr 


Hs.83753 


SNRPB 


polypeptides B and B1 


BE252108 


Hs.86368 


CLGN 


calmegin 


NM 004362 


Hs.86724 


GCH1 


GTP cyclohydrolase 1 (dopa-fesponsive 
dystonia) 


Z29433 


Hs.87409 


THBS1 


thromt>ospondin 1 


NM 003246 


Hs.8765 


RNAHP 


RNA heltease-related protein 


AI127821 


Hs.8867 


CYRei 


cysteine-rich. anqloqenic inducer, 61 


Y12084 


Hs.8889 


SHMT1 


serine hydrbxymettivltransferase 1 . 


Al7617i4 
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Hs.89538 


CETP 


cholestervl ester transfer protein, plasma 


M301d5. 


Hs.89691 


UGT2B4 


1 ir>P nlvnn<^vl transferase 2 famtlv. 
polypeptide B4 


A ^^^^^^ M ^^^^^^ 

AF064200 


Hs.89771 


GCKR 


nti \nrAc\f\^^'^ /he^toklnase 4) reouldtorv 
protein 


NMJ001486 


Hs,91 81 3 


PI V*ZJ\£. 


hi ifvrnnhifin subfamilv 2. mernt>er A2 


AI636514 


Hs.93002 




t iKtni tftin-T^nii in^tinfl ^n2vrnfi E20 


AI637467 


Hs.93194 


APOAl 




X00566 


Hs.93210 


C8A 


orkmni^m^nt f^ofTirionfint 8 alofia 


M16974 


Hs.93597 


COK5R1 


o/cllrMJeDendent kinase 5. requlatory 
siibunit 1 (d35) 


T04872 


Hs.93697 


CDK5R1 


cyclin-dependent kinase 6. regulatory 
subunit 1 foSS) 


AW088206 


Hs.93832 


LOC54499 


putative membrane protein 


AW081809 


Hs.94360 


MT1L 


metailothionein 1 L 


F26137 


Hs 94382 


ADK 


adenosine kinase 


NM 001123 


Hs.9568 


ZNF261 


zinc finqer protein 261 


X95808 


Hs.9568 


ZNF261 


zinc finaer protein 261 


NM 005096 


Hs.95998 


FRDA 


Friedreicti ataxia 


AW409831 


Hs.g629 


PRCC 


papillary renal cell cardnoma 
(translocation-associated) 


BE258195 


Hs.9670 


FU10948 


hypothetical protein FLJ10948 


AA805411 



In the second gene list, a total of 230 features, containing 1 66 unique 
UniGenes fiom the 2 1 8 significant gene list (containing 213 iinique UniGenes) 
5 id^tified hearein (Table 1) were observed to overlap (Table 3). HiCTarchical 

clust^g analysis based on the expression levels of these 230 ^overlap' features 
^arated the tissue set into distinct tumor and non-tumor groups, with four tissue 
sanqiles misclassified. kandom p«mutation of sample labels indicated that the 
dustering was significant ^a<lxlO"*) and it was unlikely that a randomly diosen set 
10 of 230 features could produce four or fewer samples misclassified (Pb<lxlO'^). These 
230 *overl^* features are therefore able to discern HCC tumor fioin non-tumor liv«:. 



Table 3. Intersection of Significant Genes Identified Her<dn wilJi HCC Genes 



UniGene 
Montifler 


Gene 


Description . 


GenBank No. 


Hs.1 03804 


HNRPU 


heterogeneous nuclear ribonuciedprotein U 
(snafFoid attachment factor A) 


X65488, T97547. 
AA496741 


Hs.104143 


CLTA 


dathrin. light polypeptide (Lea) 


AW974204, 
AA1 13872 


Hs.1 05465 


SNRPF 


small nuclear ribonudeoprotein polypeptide 

F ■ ■ ' . 


AA649986. 
AA668189 
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Hs.1 08332 


UBE202 


(homologous to yeast UBC4/5) 


NM 003339. 
AA1 59600. 
AA431868 


HS. 10642 


KAIM 


DAKI rriAmKor F?A^ nruWK^nf^ fiflfTlfl\/ 
r\/\IN, fri6ITlUci r\/\0 t#l i\^i.iyc;i ic loiiiiiy 


NM 006325. 
AA456636 


I4c iflAAA 

ris.iuo*i-o 




K1AAQ187 aene Droduct 


D80009. 
AA121504. 
AA1 29555, 
AA402812 


Hs.108636 


C1orf9 


chromosome 1 open reading frame 9 


BE466870, N36176 


Hs.1 08689 


SREBF2 


<;tprol reniilatorv element bindina 
transcription factor 2 


AA608556. 
AA701914, 
AA608556 


Hs-1 08809 


CCT7 


chaperonin containtng TCPi , subunit 7 (eta) 


AA314436, 
AA6765a8 


Hs.110713 


DEK 


DEK oncoqene (DMA binding) 


AI888504, R25377 


HS.l il9 




nuclear receptor subfamBy 4, group A, 
member 1 


NM 002135, 
N94487 


HS.11o55 






U09087, H21746. 
AA676998, T63980 


■ 9^ A A C^A T 

HS-1 1 561 7 




corticotropin releasing hormone-binding 
protein 


NM 001882, 
N26546. AA286752 


ris. It/ / 




solute carrier family 22 (organic cation 
transporter), member 1 


X98332. AA702013 


AA7A 




cydin-dependent kinase inhibitor 2A 
(melanoma, pi 6. inhibits CDK4) 


AI859822, 
AA877595 




ARFGEF2 


ADP-ritx)sylation factor guanine nucleotide- 
exchanqe factor 2 (brefeldin A-lnhibited) 


AA099582, N34053 


rio. 1 1 owoo ' 


NME1 


non-metastatic cells 1, protein (NM23A) 
expressed in 


AA147871, 
AA644092 


Hs.11902 


MYLE 


MYLE protein 


AA628977. T68845 


Hs.119651 


GPC3 


qiypican 3 


U50410, AA775872 


Lie i9in7 
ris« 1^ III/ 




putative breast adenocardnoma marker 
(32kD) 


AF042384, N25578 


l4o 12482 


GNPAT 


glyceronephosphate O-acyitransferase 


AF043937. 
W72079 


Hs.125180 


GHR 


growth hormone receptor 


X0o562, N7035O, 
AA775738 


Hs.1 3340 


HAT1 


histone acetyltransferase 1 


AF030424, 
AA625662 


Hs.1 48495 


PSMD4 


proteasome (prosome, macropain) 26S 
subunit, non-ATPase. 4 


AA604027, 
AA450227 


Hs.1 51 787 


U5-116KD 


U5 snRNP-specific protein, 1 1 6 kD 


D211d3, 


Hs.1 52931 


LBR 


lamin B receptor 


L25931. AA099136 


Hs.1 5318 


HAX1 


HS1 bindihd protein 


BE260953. R76263 




Uv? I rvCZI- 1 


1 JnP-n;)l;)ntose fransDorter related 


AW1 92554. 
R41839 


Hs.155079 


PPP2R5A 


protein phosphatase 2, regulatory subunit B 
(B56), alpha isoform 


AA234460, R59164 


Hs.1 55637 


PRKDC 


protein kinase* DNA-actiyated, catalytic 
polypeptide 


U34994, R27615 


Hs-156110 


IGKC 


immunoglobulin kappa constant 


AW404507. 
AI732289, 
AA476918, 
AA486362 


Hs.1600 


CCT5 


chaperonin containing TCPi . subunSt 5 


D43950. 



55 



wo 2004/108964 



PCT/SG2004/000166 









AA1 26599 
AA629692 


Hs.1624 


EFNA1 


ephrin-AI 


NM 004428 
AA857015 


HS.1oo41 


IvI/WVDr' 


ftilAlA/r^ t\irwitnn r%n%froin 
IVI/\VVL^ UIIHJIIIvf pivllclll 


AI866254 RS441B 


Hs.16426 


PODXL 


podocalvxin-like 


BE395330, N64508 


Hs.1657 


ESR1 


estrogen receptor 1 


AA1 64585. 
AA291702 


Hs.166468 


PDC05 


programmed death 5 


AA452724. 
AA156940 


Hs.1 66891 


RFX5 


regulatory factor X. 5 (influences HLA dass ii 
expreosiun / 


AL050135. 
AA41 8045 


Hs.1674 


GFPT1 


gIutamine-fructose-6-phosphate 

UcUlocilllu Icloc 1 


NM_002056. 
AA47a571 


Hs.1 69407 


SACM2L 


S AC2 (suppressor of actin mutations 2. 
yeasi, nomoiuHi-iiRe 


AK001725. 


Hs.1 73274 


ICAP-1A 


integrin cytoplasmic domain-associated 
protein 1 


AF012023, 


Hs.1 74140 


ACLY 


ATP citrate lyase 


AW967351. 
H08547. AA1 26708 


Hs.t74220 


CYP2C8 


cytochrome P450, subfamily IIC 
(mephenytotn 4-hydroxylase), polypeptide 8 


Ml 7398. N53136 


Hs.1 77592 


RPLP1 


rfl[>osomal protein, large, PI 


AW963733, 
AI732304 


Hs.180414 


HSP/« 


heat shodc 70kD protein 8 


AVv^4yuiU, 
H64096. AA629567 


Hs.180446 


KPNB1 


karyopherin (importin) beta 1 


L38951. 

MAX 1^1 * O^, 

AA251527. 

AA4?5006 


lis.loUa/ / 




granuiin 


AI375908, 

AI054019 

AA496452 


ns. 1 oi/o 1 u 




splicing factor proline/glutamine rich 
/nnl\/n\/rimi<ifn6 fa^cf-bindino nrotein- 


X70944, R96240. 
N24024. AA425258 


Hs.1 81 357 


LAMR1 


laminin receptor 1 (67kD, ribosomal protein 


AW328280. 
AA629897 


Hs.181444 


LOC51235 


hypothetical protein 


AI190653, 
AA455565 


Hs.1 84222 


DSCR1 


Down syndrome critical roQion gene 1 


U85267, AA629707 


Hs.1 8443 


ALDH8A1 


aldehyde dehydrogenase 8 family, member 

r\ 1 


AI051566. N70701 


Hs.1 94673 


PEA15 


phosphoprotein enriched in astrocytes 15 


Y13736. AA293211 


Hs.1 989 


SRD5A2 


Sieroio-v^ipno-reuuciase, ciipna pcnypepuuK; 
2 


M74047. AI420552 


Hs.199067 


ERBB3 


v-eri>b2 avian erythroblastic leukemia viral 
onooqene homdog 3 


AI565773, N24966. 
AA042878 


Hs.1 99263 


MT1L 
STK39 


metallothtonein 1L 

serine threonine kinase 39 (STE20/SPS1 
homoloq. yeast) 


F26137. H84871 


Hs.2 


NAT2 


N-acetyltransferase 2 (arylamine N- 
acetyltransferase) 


D90040. Ai262683 


Hs.20144 


SCYA14 


small inducible cytokine subfamily A (Cys- 
Cvs>, member 14 


NM 004166. 
R96626 
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Hs.20716 


TIM17 


translocase of inner mitochondrial membrane 
17 homoloq A (yeast) 


AW247564. 
AA708446 


Hs.22785 


6ABRE 


gamma-aminobutyric aad (GABA) A 
receptor, epsllon 


NM 004961. 
H63532 


Hs 236774 


HMG17L3 


high-mobflity group (nonhistone 
chromosomal) protein 1 7-like 3 




Hs 236828 


WHIP 


Wprnp>r HaIic^^p int^r^f^tinn nrot^in 


AA481600. 
AA188168 


Hs.237356 


SDF1 


stromal cell-derived factor 1 


L36033, AA447115 


Hs.23767 


FLJ12666 


hypothetical protein FLJ 12666 


AW952494. 
AA432056 


Hs.24485 


CSPG6 


chondroitin sulfate proteoglycan 6 (bamacan) 


NM 005445. 

W40150. 

AA463410 




HNIRPH1 


heterogeneous nuclear rit>onucieoprotein HI 
(H) 


BE296051. 
R11018.W9»}58 




ivir\r^o 1^ 


miiocnonuiioi noosomai proiein oih- 


AW973521. 
T51290, AA460831 






reguiaior or v^'fjroiein signalling o 


AI674877. N34362. 
AA668470 


Hs.25132 


K1AA0470 


KIAA0470 gene product 


NM 014812. 
AI049669. 
AA167129. 
AA187982 


Hs-252229 


MAFG 


v-maf musculoaponeurotic fibrosarcoma 
(a>^dn) oncogene family, protein G 


AF059195, 
N21609. AA045436 


i-le 9I%<17 


1 v/O 


v4ds FBJ murine osteosarcoma viral 
oncogene homoloq 


V01512. R12840, 
N36944. AA485377 


Hs 25797 


SF3B4 


<intir!f nn fsu^nr <si ihi mil' il A.CklcV% 

0|i/IIOII i^. lOfiMVM vLry OUUUI ill ^9IVL/ 


NM 005850. 
AA699361 


Hs.26403 


GSTZ1 


glutatfiione transferase zeta 1 
(maleylacetoacetate isomerase) . 


U86529. AA428334 


Hs^6433 


DPAGT1 


dolichyl-phosphate (UDP-N- 
acetylglucosamine) N- 
acetylgiucosaminephosphotransferase 1 
(GicNAo-1 -P transferase) 


Z82022. R55619. 
AA452517 


Hs 271980 


MAPK6 


mifi)ni=^n-£)f:ti\/sf]tprl nmtein ictn^^^ ft 


NM 002748. 
AA603152. HI 7504 


Hs.275163 


NME2 


non-metastatic ceils 2, protein (NM23B) 

expressed in 


Li 6785. 

AA422058, 

AA4g6512 


Hs^87797 


ITGB1 


integrin, beta 1 (fibrqnectin receptor, beta 
polypeptide, antigen CD29 includes MDF2, 
MSK12) 

Homo sapiens, done MGC: 17220 


W38716 

AA037283, 

W67173 


Hs.291904 


DXS1357E 


accessory proteins BAP31/BAP29 


Z31696. AA625628 


Hs.2d!34 


RRM1 


ribonucleotide reductase Ml polypeptide 


X59543. AA633549 


Hs.293441 




Homo sapiens SNC73 protein (SNC73) 
mRNA, complete cds 


AA290845, 
H28469. H73590 


Hs.296341 


CAP2 


adenyiyi cydase-assodated protein 2 


AW779995. 
AA040613 


Hs.300697 


IGHG3 


immunoglobulin heavy constant gamma 3 
(G3m marker) 


078345. 
AA740786, 
N92646. AA465378 


Hs.301005 


H2AV 


histone H2A.F/Z variant 


BE409809. H97000 


Hs.301404 


RBM3 


RNA binding motif protein 3 


NM 006743. 
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Hs.301819 


ZNF146 


zinc finqer orotein 146 


X70394. AA504351 


Hs.3041 


UNG2 


uraciMDNA glycosylase 2 


AA291356. 


Hs.3164 


NUCB2 


nucleobindin 2 


AW951523, 
vvyoyD4, 


Hs.321231 


B4GALT3 


UDP-Gal:betaGlcNAc beta 1.4- 
galactosViusnsTerase, poiypcpiiae o 


Y12509, AA424578 


Hs.323817 


DKFZP547E 
101 


DKrZP547i::1UlU protCin 


NM_015607, 
AA418004 


Hs.332633 


BBS2 


Bardet-Biedi syndrome 2 


AA486738 


Hs.333495 


DSS1 


Deleted in split-hand/split-foot 1 region 


Wf sAIDf . riOOHO*!' 


Hs.334612 


SNRPE 


small nudear ribonudeoprotein polypeptide 
E 


X12466. AA678021 


Hs.334787 


MGC19556 


hypothetical protein MGC19556 


Bt379431 , 
AAoUy4Do 


Hs.342389 


PPIA 


peptldylprolyl isomerase A (cyclophilin A) 


AW732921. 
n7zo74 


Hs.349961 


RPL6 


ribosomal protein L6 


AW675430. 


Hs.356525 


FLJ12806 


ESTs, Weakly similar to CNG1_HUMAN 
cGMP-gated cation channel alpha i (ONO 
channel alpha 1) 




Hs.3610 


KIAA0205 


KIAA0205 oene product 


D86960. R91263 


Hs.36102 




ESTs. Highly similar to SMHU1B 
metallothionetn IB [H.sapiens] 


R99207. H72722 


Hs.4 


AOH1B 


alcohol dehydrogenase IB (dass I), beta 
polypeptide 


M24317. N93428 


Hs.41587 


RAD50 


RAD50 (S. cerevisiae) homoiog 


275311. H99196, 

MMIZOhOZ 


Hs.431 


BMI1 


murine leukemia viral (bmi-1 ) oncogene 
homoiog 


AA884913, 

MMDUOOOO . 

T87514, W90704. 


Hs.44532 


UBD 


diubiquitin 


NMJ006398. 


Hs.44585 


TP53BP2 


tumor protein p53-binding protein, 2 


AI123916. H69077. 


Hs.46440 


SLC21A3 


solute carrier family 21 (organic anion 
transporter), memoer o 


U21943. N62948 


Hs.4756 


FEN1 


flap structure-spedfic endonudease 1 


BE278623. 


Hs:50758 


SMG4L1 


SMC4 (structural maintenance of 
cnromosomes yeasi/niive i 


AB019987. 


Hs.5085 


DPMI 


dolichyl-phosphate mannosyitransferase 
polypeptide 1. catalytic subunit 


AW1 73486. 
AA004759 


Hs.Sf2002 


CD5L 


CDS antigen-like (scavenger receptor 
cysteine rich family) 


NM 005894. 
AA677254 


Hs.554 


SSA2 


Sjogren syndrome antigen A2 (60kD. 
ribonudeoprotein autoantiqen SS-A/Ro) 


NM 004600. 
AA0ld351 


Hs.5662 


GMB2L1 


guanine nudeotide binding protein (G 
protein), beta polypeptide 2-like 1 


AA640657. R96220 


Hs.57101 


MCM2 


mtnlchromosome maintenance defident (S. 


BE250461. 
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cerevtsiae) 2 (mitotin) 


AA454572 


Hs.5737 


KIAA0475 


KIAA0475 gene product 


AA524523, N73927 


Hs '57783 


EIF3S9 


eukaryotic translation initiation factor 3, 
subunit 9 (eta 116kD) 


AA676471 


Hs.6127 




Homo sapiens cDNA: FLJ23020 fis, done 
LNG00943 


AA054768. T67278 


Hs.6551 


ATP6S1 


ATPase, H+ transporting, lysosomal 
interacting protein 1 


NM_001183, 
AA487588 


Hs6G50 


VPS45B 


vacuolar protein sorting 45B (yeast homolog) 


AA702845. 
AA885433 


Hs 6838 


ARHE 


ras honnoiog gene family, member E 


W03441,W86282. 
AA443302 


Hs.695 


CSTB 


cystatin B (stefin B) 


AI831499. 110374 


Hs.699 


PPIB 


peptidylprolyl isomerase B (cydophifin 8) 


BE386706. 
N45313. AA481464 


Hs.69997 


ZNF238 


zinc finger protein 238 


AJ223321.R79722 


Hq TAAA'i 




chromodomain helicase DMA binding protein 
4 




H<i 7*51 1 7 
tio. r 9 1 1 f 


II— 1^ 


tntt*r\t>t lie in f^nVi^nc^r hinrlino f A^frtr 9 imirr^ 

II lid IVUIVII 1 C71 11 lai IV^d t/ll HJII 1^ ld\xl.\JI *T\JtSXJ 


AA307289. 
AA894687. H95638 


Hs.75183 


CYP2E 


cytochrome P450, subfamily HE (ethanol- 
indudble) 


J02843. H50500 


Hs.76187 


KIAA0016 


translocase of outer mitochondrial 
membrane 20 (yeast) homolog 


D13641. AA644550 


Hs.75258 


H2AFY 


H2A histone family, member Y 


AA307460, 
AA486063 


Hs.76354 


GCN1L1 


GCN1 (general control of amino-add 
synthesis 1, yeastHike 1 


D86973, R55250 


Hs.75412 


ARMET 


arginine-rich, mutated in early stage tumors 


AA582041.R91550 


Hs 75424 


iOI 


inhibitor of DNA binding 1 , dominant 
negate helbc-loop-helix protein 


S78825 AA4571S8 


Hs.75546 


CAPZA2 


capping protein (actin filament) musde Z- 
line, alpha 2 


U03851.AA083228 


Hs 75659 


MPV17 


MpV17 transgene, murine homolog, 
jriomerulosderosis 


NM 002437. 
R55046 


Hs 7S678 


FOSB 


FBJ murine osteosarcoma viral oncogene 
homolog B 


L49169 T61d4a 


Hs.75981 


USP14 


ubiquitin specific protease 14 (tRNA-guanine 
transglycosylase) 


NM 005151. 
AA03951i.T65861 


Hs.76230 


RPS10 


ribosomal protein S10 


AW245775. 
AA828564, 
AA828819. 
AI054003 


Hs.7€f285 


DKFZP564B 
167 


DKF2P564B167 protein 


AI032331. 
AA621342 


Hs.76325 


IGJ 


immunoglotHJlin J polypeptide, linker protein 
for immunoglotHJlin alpha and mu 
polypeptides 

Homo sapiens, clone MGC: 24130 


AW1 72754, 
T90492. T700i57 


Hs.7655 


U2AF65 


U2 small nudear ribonudeoprotein auxiliary 
fador(65kD) 


AA936430, 
AA405748 


Hs.7720 


DNCH1 


dynein, cytoplasmic, heavy polypeptkie 1 


AB002323. 
AA010589. 
W78967 


Hs.77254 


CBX1 


chromobox homolog 1 (HP1 t>eta homolog 
DrosophHa) 


AL046741. 
AA448667 
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Hs.77326 


IGFBP3 


insulin-like growth factor binding protein 3 


BE336944. 
AA598601 


Hs.77608 


SFRS9 


splicing factor, arginlne/serine-rich 9 


AL021546, 
N47892, AA490721 


Hs.78065 


C7 


complement component 7 




Hs.78902 


VDAC2 


voltage-dependent anion channel 2 


AA857093.T66813 


Hs.79090 


XP01 


exportin i {OKMi • yeasi, nomoiog) 


nflQ729 T59055 


Hs.79110 


NCL 


nucleoiin 


AK000250. 


Hs,79150 


CCT4 


chaperonin containing TCP1 , subunit 4 
(delta) 


U38S46, T98634. 
AA598637 


Hs.79162 


SSRP1 


structure specie recognition protein 1 




Hs.80343 


MMP15 


matnx metaHoproteinase i o ^memDrane- 
inserted) 


D85510. AA443300 


Hs.80552 


DPT 


dermatopontin 


/Mr W 1 1 y 


Hs.809 


HGF 


hepatocyte growth factor (hepapoietin A; 
scatter factor) 


XI 6323. R52797 


Hs.80917 


AP3S1 


adaptor-related protein complex 3, sigma 1 
subunit 


D63643. AA996044 


Hs.80919 


SYPL 


svnaptophysin-like protein 


S72481. AA427447 


Hs.81972 


SHC1 


SHC (Src homology 2 domain-containing) 
transforming protein 1 


AOO 1 ^O, rxO^NjOvr, 

T50498 


Hs.82043 


D123 


D1 23 qene product 




Hs.82159 


PSMA1 


proteasome (prosome, macropain; suDunn, 
alpha type. 1 


AI889267. R27S85 


Hs.82793 


PSMB3 


proteasome (prosome,. macropain) subunit, 
beta type, 3 


AI028114. 


Hs,82916 


CCT6A 


chaperonin coniaining i v^r^t , suounii o« 
f zeia I ) 


L27706 

AA872690, H84286 


Hs.83753 


SNRPB 


small nuclear ribonudeoprotein polypeptides 
D ana p i 


BE252108, 
AA599116 


Hs.84790 


C7orf14 


chromosome 7 open reading frame 14 


D86978. AA600190 


Hs.85119 


SmT3H1 


SMT3 (suppressor of mif two 3, yeast) 
homoiog 1 


AA1 60893. 
AA862529 
AA872379 


Hs.8765 


RNAHP 


RNA helicaseHrelated protein 


AI814448.T56221 


Hs.8867 


CVR61 


cvstetne-rich, angiogenic inducer. 61 


Y12084. AA777187 


Hs.89525 


HDGF 


hepatoma^erived growth factor (high- 
mobility group protein 1-1ike) 


AA453749 


Hs.90093 


HSPA4 


heat shock 70kD protein 4 


AA131267, 
AA4339t6 


Hs.90370 


ARPC1A 


actin related protein 2/3 complex, subunit 1 A 
(41 kD) 


Y08999. 
AA490209. 
AA016251, 
AA151930 


Hs.90744 


PSMD11 


proteasome (prosome, macropain) 26S 
subunit non-ATPase. 1 1 


AB003102 


Hs.99969 


FUS 


fusion, derived from t(12;16) malignant 
liposarcoma 


BE396632. 101207 
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In the third gene list, a total of 68 unique UniGenes from the 218 significant 
gene list (containing 213 unique UniGenes) identified herein (Table 1) were observed 
to overlap (Table 4), and the likelihood that the overlap would arise by chance if the 
two gene lists were totally independent was minuscule (Pc<lxlO'^), 



Table 4. Intersecdon of Significant Genes Identified Herein with HCC Genes 



UniGene 
identifier 




Lvescnption . 


GenBank No. 


Hs.1 19651 


GPC3 


glypican 3 


U50410. AA775872 


Hs.1 251 80 


GHR 


qrowth hormone receptor 


X06562. N70358 


Hs.1 80577 


GRN 


granufin 


AI375908 
AA496452 


Hs.44585 


TP53BP2 


tumor protein p53-bindinq protein, 2 


All 2391 6, H69077 


nS.7732o 


IGFBP3 


tnsulin-ltke growth factor binding protein 3 


BE336944, 
AA598601 


Hs.8867 


CYR61 


cysteine-rich, anqloqenic inducer, 61 


Y12084. AATniQl 


Hs.1600 


CCT5 
HSEC61 


chaperonin containing TCP1, subuntt 5 

(epsiion) 

sec 61 homoloq 


D43950. AA629692 


Mo TiA'iil 

ris.#o*iiu 




heat slKx^k 70lcD protein 5 (glucose- 
regulated protein. 78kD) 


AL043206. 
AA962446 


Hs.152931 


LBR 


lamin B receptor 


L25931. AA099lae 






uynein, cycopiasmic, neavy- polypeptide i 


AB002323, 
W78967 


■ io.*t/ \J\J 




Tiap suucuire'Speciiic enoonuciease i 


BE278623. 
AA620553 


Hs.50758 


SMC4L1 


SMC4 (structural maintenance of 
chromosomes 4, yeast}-like 1 
CAP-C chromosome assodated 
polypeptide C 


AB019987, 
AA452095 


Hs.77254 


CBX1 


chromobox homolog 1 (HP1 beta hombiog 
Drosophia ) 


AL046741. 
AA44d667 


Hs.2934 


RRM1 


ribonucleotide reductase M1 polypeptide 


X59543. AA633549 


Hs.156110 


IGKC 


Immunoglobulin kappa constant 


AW404507, 

AA402920. 

AA486362 


Hs.20144 


SCYA14 


small indudbie cytokine subfamily A (Cys- 
Cys). member 14 


NM 004166, 
R96626 


Hs.237356 


SDF1 


stromal cell-derived factor 1 


L36033. AA447115 


Hs.78065 


C7 


complement component 7 


X86328. AA598478 


His.11 8638 


NME1 


non-metastataccelis 1. protein (NM23A) 
expressed in 


AA147871. 
AA644092 


Hs.12482 


GNPAT 


glyceronephosphate O-acyltransferase 


AF043937. 
AA486845 


Hs-174140 


ACLV 


ATP citrate lyase 


AW967351. 
H08547 


Hs.1 74220 


CYP2C8 


cytochrome P450. subfamOy lie 
(mephenytoin 4-hydroxyiase), polypeptide 
8 


M17398, N53136 


Hs.26403 


GSTZ1 


^utathione transferase zeta 1 
(maleviacetoacetate isomerase) 


U865^. AA428334 
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Hs,4 


ADH1B 


alcohol dehydrogenase IB (class 1). beta 
polypeptide 


M24317. N93428 


Hs.1 77592 


RPLP1 


ribosomal protein, large. PI 


AW963733. 
AI732304 


Hs.25797 


SF3B4 


splicing factor 3b, subunit 4, 49kO 


NMJ)05850. 
AA699361 


Hs.76230 


RPS10 


ribosomal protein S10 


AW245775. 
AI054003 


Hs.76325 


IGJ 
SLU7 


immunoglobulin J polypeptide, linker 
protein for immunoglobulin alpha and mu 
polypeptides 

step II splidnq factor SLU7 


AW1 72754, 
T70057 


Hs.83753 


SNRPB 


small nuclear ritmnudeoprotein 
polypeptides B and B1 


BE262108, 
AA599116 


Hs.115617 


CRHBP 


corticotropin releasing hormone-binding 
protein 


NM 001882. 
AA286752 


Hs,1 18249 


ARFGEF2 
BIG2 


AOP-ribosyiation factor guanine 
nudeotide-exchange factor 2 (brefeidin A- 
inhibited) 

Brefeidin A-inhibited guanine nudeotide- 
exchange protein 


AA099582, N34053 


Hs.1 55079 


PPP2R5A 


protein phosphatase 2, regulatory subunit 
B (B56), alpha isoform 


AA234460. R59164 


Hs-155637 


PRKDC 


protein kinase. DMA-activated. <»talytk; 
pdypeptkle 


U34994, R27615 


Hs.1624 


ERM1 


ephrin-AI 


NM 004428, 
AA857016 


Hs.182278 


CALM2 


calmodulin 2 (phosphorylase kinase, 
delta) 


045887. AA043551 


Hs.199263 


STK39 
SPAK 


senne threonine kinase 39 (STE20/SPS1 
homolog, yeast) 
ste-20 related kinase 


F26137. H84871 


Hs.22785 


GABRE 


gamma-aminpbutyric acM (GABA) A 
receptor, epsflon 


NMJ)04961. 
HoSooZ 


Hs.24950 


RGS5 


regulator of G-protein signalling 5 


Aie74877, N34362. 

.AAoDo470 


Hs.296341 


CAP2 


adenylyl cydase-associated protein 2 


AW779995, 

A A Ail ACi4 O 

AAu4Uo13 


Hs.81972 


SHC1 


SHC (Src homoh^ 2 domain-containing) 
transforming protein 1 


X68148, R52960, 
To04yo 


Hs.11 19 


NR4A1 


nuclear receptor subfamily 4, group A, 
member 1 


NM_002135, 
N944o7 


Hs.1 657 


ESR1 


estrogen receptor 1 


AL078582, 
AA291702 


Hs.166891 


RFX5 


regulatory factor X, 5 (influences HLA 
cbss II expression) 


AL050135. 
AA41864S 


Hs^52229 


MAFG 


vHnaf musculoaponeurotic fibrosarcoma 
\avian| oncogene lamiiy, proiein 


AF059195. N21609 


Hs.25647 


FOS 


v-fos FBJ murine osteosarcoma viral 
oncoqene homoloq 


vol 51 2, N36944. 
AA485377 


Hs.431 


BMI1 


murine leukemia viral (bmt-1) oncogene 
homolog 


AA884913, 

W907d4, 

AA478036 


Hs.75117 


IU=2 


interleukih enhancer binding factor 2, 
45kD 


AA307289. 
H95638. AA894687 


Hs.75678 


FOSB 


FBJ murine osteosarcoma viral oncogene 


L49169. T61948 
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homofog B 




Hs.6551 


ATP6IP1 


ATPase, H+ transporting, lysosomal 
interactinq protein 1 


NM 001183. 
AA487588 


Hs.79090 


XP01 


exportin 1 (CRMl, yeast, homoloq) 


D89729, T59055 


Hs.80917 


AP3S1 


adaptornrelated protein connplex 3« sigma 
1 subunit 


D63643. AA996044 


Hs.80919 


SYPL 


synaptophysin-like protein 


S72481. AA427447 


Hs.44532 


UBD 


diubiquitin 


NM 006398N3392 
0 


Hs.106061 


RDBP 


RD RNA-binding protein 


X16105. AA056390 


Hs.108636 


ClOffS 
CHI 


chromosonne 1 open reading frame 9 
membrane protein CHI 


BE466870, N36176 


Hs.110713 


DEK 


DEK oncogene (DMA binding) 


AI888504. R25377 


no* 1 1 


TMPO 


Thymopoietin 
ESTs 




Hs.16341 


MAWBP 


MAWD binding protein 

ESTs weakly similar to predicted using 

genefinder fC. elegansi 


AI866254. R54416 


Hs.16426 


PODXL 


podocaiyxin-4ike 


BE395330. N64508 


Hs.18443 


ALDH8A1 


aldehyde dehydrogenase 8 family, 
member A1 

ESTs 


AI051566. N70701 


Hs.1 94673 


PEA15 


phosphoprotein enriched in astrocytes 15 


Y13736, AA293211 


HS.237&7 


FU12666 


nj^KJuieucai pruiein ri.ji^ooo 

Homo sapiens cDNA FLJ12666 fis, done 


AW952494, 
H10192, 
AA1 15300. 
AA131466 


Hs^1904 


DXSiaSTE 


accessory proteins BAP31/BAP29 


Z31696. AA625628 


Hs.3610 


KIAA0205 


KIAA0205 gene product 


O86960. R91263 


ns.oo 1 




ESTs. Highly similar to SMHU1 B 
metaliothk>nein IB [H.sapiens] 
ESTs highly similar to MT1 B Human 
Metallothionein-iB fH.sapiens] 


DQOOCiV \A70700 


Hs 6838 


ARHE 


roc Kf)moloo Cif^nt^ fi^i^mtlx/ nif^inh^^r ^ 

1 do llwlllWIWU 1 CM (Illy, lllwilili^wi L» 


W03441, 
AA443302 


Hs.75187 


TOMM20- 
PENDI 


translocase of outer mitochondrial 
membrane 20 (yeast) homdog 
KIAA001 6 translocase of outer 
mitochondrisd membrane 20 (yeast) 
homolog 


D13641, AA644550 


Hs.8765 


RNAHP 


RNA helicase-reiated protein 


AI814448.T56221, 
N55459 



Hie discriminator cassettes were assessed on an independent tissue set of 58 
liver clinical biopsies from 29 patients. Using a kNN prediction algorithm, it was 
found ^t all classifier probe cassettes could readily distinguish HCC tumor from 
non-tumor liver (Table 5), and that the gene discriminators of tumor vs. non-tumor in 
H(X derived by the 

intersect analysis of limited tissue ssets can be validated in an independent manner* 
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Table 5. Prediction accuracy of gene classifiers using k NN algorithm on 58 liver 
biopsies from 29 patients. 



Gene 


No. of 


Misclassification 


No.of 


No. of false 


Predictive 


classifiers 


gene dassifiers 


rate 


false negative 


positive cases* 


accuracy 








cases* 


* 




Table 1 


218 


4 of 58 


4 




d3% 


Table 2 


265 


3 of 58 


3 




95% 


Table 3 


166 


3of 58 


3 




95% 


Table 4 


68 


2 of 58 


1 


1 


96% 



* False negative cases refer to HCC tumors which were misclassified as non-tumor livers. 
^False positive cases refer to non-tumor livers which were mlsdassified as HCC tumors. 
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OTHER EMBODIMENTS 

Although various embodiments of the invention are disclosed herein, many 
adaptations and modifications may be made within the scope of the invention in 
accordance with the common general knowledge of those skilled in this art Such 
modifications include die substitution of known equivalents for any aspect of the 
invention in ord^ to achieve the same result in substantially the same way. Accession 
numbers, as used herein, may refer to Accession numbers firom multiple databases, 
including GeoBank, die European Molecular Biology Laboratory (EMBL), the DNA 
Database of Japan (DDBJ), or the Gmome Sequence Data Base (GSDB), for 
nucleotide sequences, and including the Protein lofoimation Resource (PIR), 
SWISSPROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) 
(sequences from solved structures), as well as from translations from annotated 
coding regions from nucleotide sequences in GenBank, EMBL, DDBJ, or RefSeq, for 
polypeptide sequences. Accession numbers, as used herein, may also refer to 
Accession numbers from databases such as UniGene, OMIM, LocusLink, or 
HomoloGene« Numeric ranges are inclusive of the numbers defining the range. In the 
^ledfication, the word ^^comprisingf' is used as an op^-^ded term, substantially 
equivalent to the phrase ^^including, but not limited to'% and the word '^comprises" has 
a corresponding meaning* Citation ofrefereacGs herein shall not be construed as an 
admission that sudi references are prior art to the presral invention. All publications 
are incorporated herein by reference as if each individual publication were 
spedfically and individually indicated to be incorporated by reference herein and as 
thou^ fiiUy set forth hCTein. The invention includes all embodiments and variations 
substantially as hereinbefore described and with ref^ence to the examples and 
drawings. 
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